Skip to main content

Full text of "Foundations of the theory of probability"

See other formats


UNIVERSITY 
OF FLORIDA 
LIBRARIES 




COLLEGE LIBRARY 



FOUNDATIONS 

OF THE 

THEORY OF PROBABILITY 



BY 

A. N. KOLMOGOROV 



TRANSLATION EDITED BY 

NATHAN MORRISON 



CHELSEA PUBLISHING COMPANY 
NEW YORK 

1950 



SV^A 



COPYRIGHT 1950 BY 
CHELSEA PUBLISHING COMPANY 



PRINTED IN U.S.A. 



« 



EDITOR'S NOTE 

In the preparation of this English translation of Professor 
Kolmogorov's fundamental work, the original German monograph 
Grundbegriffe der Wahrscheinlichkeitrechnung which appeared 
in the Ergebnisse Der Mathematik in 1933, and also a Russian 
translation by G. M. Bavli published in 1936 have been used. 

It is a pleasure to acknowledge the invaluable assistance of 
two friends and former colleagues, Mrs. Ida Rhodes and Mr. 
D. V. Varley, and also of my niece, Gizella Gross. 

Thanks are also due to Mr. Roy Kuebler who made available 
for comparison purposes his independent English translation of 
the original German monograph. 

Nathan Morrison 



N 



J 



V) 



Digitized by the Internet Archive 
in 2013 



http://archive.org/details/foundationsoftheOOkolm 



PREFACE 

The purpose of this monograph is to give an axiomatic 
foundation for the theory of probability. The author set himself 
the task of putting in their natural place, among the general 
notions of modern mathematics, the basic concepts of probability 
theory — concepts which until recently were considered to be quite 
peculiar. 

This task would have been a rather hopeless one before the 
introduction of Lebesgue's theories of measure and integration. 
However, after Lebesgue's publication of his investigations, the 
analogies between measure of a set and probability of an event, 
and between integral of a function and mathematical expectation 
of a random variable, became apparent. These analogies allowed 
of further extensions; thus, for example, various properties of 
independent random variables were seen to be in complete analogy 
with the corresponding properties of orthogonal functions. But 
if probability theory was to be based on the above analogies, it 
still was necessary to make the theories of measure and integra- 
tion independent of the geometric elements which were in the 
foreground with Lebesgue. This has been done by Frechet. 

While a conception of probability theory based on the above 
general viewpoints has been current for some time among certain 
mathematicians, there was lacking a complete exposition of the 
whole system, free of extraneous complications. (Cf., however, 
the book by Frechet, [2] in the bibliography.) 

I wish to call attention to those points of the present exposition 
which are outside the above-mentioned range of ideas familiar to 
the specialist. They are the following: Probability distributions 
in infinite-dimensional spaces (Chapter III, § 4) ; differentiation 
and integration of mathematical expectations with respect to a 
parameter (Chapter IV, § 5) ; and especially the theory of condi- 
tional probabilities and conditional expectations (Chapter V). 
It should be emphasized that these new problems arose, of neces- 
sity, from some perfectly concrete physical problems. 1 



1 Cf., e.g., the paper by M. Leontovich quoted in footnote 6 on p. 46; also the 
joint paper by the author and M. Leontovich, Zur Statistik der kontinuier- 
lichen Systeme und des zeitlichen Verlaufes der physikalischen Vorgdnge. 
Phys. Jour, of the USSR, Vol. 3, 1933, pp. 35-63. 



vi Preface 

The sixth chapter contains a survey, without proofs, of some 
results of A. Khinchine and the author of the limitations on the 
applicability of the ordinary and of the strong law of large num- 
bers. The bibliography contains some recent works which should 
be of interest from the point of view of the foundations of the 
subject. 

I wish to express my warm thanks to Mr. Khinchine, who 
has read carefully the whole manuscript and proposed several 
improvements. 

Kljasma near Moscow, Easter 1933. 

A. Kolmogorov 



CONTENTS 

Page 

Editors Note . iii 

Preface — v 

I. Elementary Theory of Probability 

§ 1. Axioms 2 

§ 2. The relation to experimental data 3 

§ 3. Notes on terminology 5 

§ 4. Immediate corollaries of the axioms ; conditional proba- 
bilities ; Theorem of Bayes 6 

§ 5. Independence 8 

§ 6. Conditional probabilities as random variables ; Markov 

chains 12 

II. Infinite Probability Fields 

§ 1. Axiom of Continuity 14 

§ 2. Borel fields of probability 16 

§ 3. Examples of infinite fields of probability 18 

III. Random Variables 

§ 1. Probability functions 21 

§ 2. Definition of random variables and of distribution func- 
tions 22 

§ 3. Multi-dimensional distribution functions 24 

§ 4. Probabilities in infinite-dimensional spaces 27 

§ 5. Equivalent random variables ; various kinds of converg- 
ence 33 

IV. Mathematical Expectations 

§ 1. Abstract Lebesgue integrals 37 

§ 2. Absolute and conditional mathematical expectations .... 39 

§ 3. The Tchebycheff inequality 42 

§ 4. Some criteria for convergence • 43 

§ 5. Differentiation and integration of mathematical expecta- 
tions with respect to a parameter 44 



Vll 



viii Contents 

V. Conditional Probabilities and Mathematical 
Expectations 

§ 1. Conditional probabilities 47 

§ 2. Explanation of a Borel paradox 50 

§ 3. Conditional probabilities with respect to a random vari- 
able 51 

§ 4. Conditional mathematical expectations 52 

VI. Independence; The Law of Large Numbers 

§ 1. Independence 57 

§ 2. Independent random variables 58 

§ 3. The Law of Large Numbers 61 

§ 4. Notes on the concept of mathematical expectation 64 

§ 5. The Strong Law of Large Numbers ; convergence of a 

series 66 

Appendix — Zero-or-one law in the theory of probability 69 

Bibliography 71 



Chapter I 



ELEMENTARY THEORY OF PROBABILITY 

We define as elementary theory of probability that part of 
the theory in which we have to deal with probabilities of only a 
finite number of events. The theorems which we derive here can 
be applied also to the problems connected with an infinite number 
of random events. However, when the latter are studied, essen- 
tially new principles are used. Therefore the only axiom of the 
mathematical theory of probability which deals particularly with 
the case of an infinite number of random events is not introduced 
until the beginning of Chapter II (Axiom VI). 

The theory of probability, as a mathematical discipline, can 
and should be developed from axioms in exactly the same way 
as Geometry and Algebra. This means that after we have defined 
the elements to be studied and their basic relations, and have 
stated the axioms by which these relations are to be governed, 
all further exposition must be based exclusively on these axioms, 
independent of the usual concrete meaning of these elements and 
their relations. 

In accordance with the above, in § 1 the concept of a field of 
probabilities is defined as a system of sets which satisfies certain 
conditions. What the elements of this set represent is of no im- 
portance in the purely mathematical development of the theory 
of probability (cf. the introduction of basic geometric concepts 
in the Foundations of Geometry by Hilbert, or the definitions of 
groups, rings and fields in abstract algebra). 

Every axiomatic (abstract) theory admits, as is well known, 
of an unlimited number of concrete interpretations besides those 
from which it was derived. Thus we find applications in fields of 
science which have no relation to the concepts of random event 
and of probability in the precise meaning of these words. 

The postulational basis of the theory of probability can be 
established by different methods in respect to the selection of 
axioms as well as in the selection of basic concepts and relations. 
However, if our aim is to achieve the utmost simplicity both in 



2 I. Elementary Theory of Probability 

the system of axioms and in the further development of the 
theory, then the postulational concepts of a random event and 
its probability seem the most suitable. There are other postula- 
tional systems of the theory of probability, particularly those in 
which the concept of probability is not treated as one of the basic 
concepts, but is itself expressed by means of other concepts. 1 
However, in that case, the aim is different, namely, to tie up as 
closely as possible the mathematical theory with the empirical 
development of the theory of probability. 

§ 1. Axioms 2 

Let E be a collection of elements ( t rj, £, . . . , which we shall call 
elementary events, and g a set of subsets of E; the elements of 
the set g will be called random events. 

I. 5 is a field 3 of sets. 
II. g contains the set E. 

III. To each set Ain% is assigned a non-negative real number 
P(A). This number P(A) is called the probability of the event A. 

IV. P(E) equals 1. 

V. // A and B have no element in common, then 

P(A + B)=P(A)+P(B) 

A system of sets, $, together with a definite assignment of 
numbers P(A), satisfying Axioms I-V, is called a field of prob- 
ability. 

Our system of Axioms I-V is consistent. This is proved by the 
following example. Let E consist of the single element $ and let g 
consist of E and the null set 0. P(E) is then set equal to 1 and 
P(0) equals 0. 



1 For example, R. von Mises[l]and [2] and S. Bernstein [1]. 

2 The reader who wishes from the outset to give a concrete meaning to the 
following axioms, is referred to § 2. 

3 Cf . Hausdorff, Mengenlehre, 1927, p. 78. A system of sets is called a field 
if the sum, product, and difference of two sets of the system also belong to the 
same system. Every non-empty field contains the null set 0. Using Hausdorff 's 
notation, we designate the product of A and B by AB; the sum by A + B in 
the case where AB — 0; and in the general case by A + B; the difference of 
A and B by A-B. The set E- A, which is the complement of A, will be denoted 
by K. We shall assume that the reader is familiar with the fundamental rules 
of operations of sets and their sums, products, and differences. All subsets 
of g will be designated by Latin capitals. 



§ 2. The Relation to Experimental Data 3 

Our system of axioms is not, however, complete, for in various 
problems in the theory of probability different fields of proba- 
bility have to be examined. 

The Construction of Fields of Probability. The simplest fields 
of probability are constructed as follows. We take an arbitrary 
finite set E = {| t , £ 2 , . . ., £*} and an arbitrary set {p lt p 2 , . . ., p k ) 
of non-negative numbers with the sum Pi + p 2 + • • • + Pk — 1. 
g is taken as the set of all subsets in E, and we put 

P{ft i ,^,...,^} = ^ i + fc + - v + ^. 

In such cases, p u p 2 , . . . , p k are called the probabilities of the 
elementary events $ 1} £ 2 , . . . , $ k or simply elementary probabili- 
ties. In this way are derived all possible finite fields of probability 
in which gf consists of the set of all subsets of E. (The field of 
probability is called finite if the set E is finite.) For further 
examples see Chap. II, § 3. 

§ 2. The Relation to Experimental Data 4 

We apply the theory of probability to the actual world of 
experiments in the following manner: 

1) There is assumed a complex of conditions, ©, which allows 
of any number of repetitions. 

2) We study a definite set of events which could take place as 
a result of the establishment of the conditions S. In individual 
cases where the conditions are realized, the events occur, gener- 
ally, in different ways. Let E be the set of all possible variants 
d, &, . . . of the outcome of the given events. Some of these vari- 
ants might in general not occur. We include in set E all the vari- 
ants which we regard a priori as possible. 

3) If the variant of the events which has actually occurred 



4 The reader who is interested in the purely mathematical development of 
the theory only, need not read this section, since the work following it is based 
only upon the axioms in § 1 and makes no use of the present discussion. Here 
we limit ourselves to a simple explanation of how the axioms of the theory of 
probability arose and disregard the deep philosophical dissertations on the 
concept of probability in the experimental world. In establishing the premises 
necessary for the applicability of the theory of probability to the world of 
actual events, the author has used, in large measure, the work of R. v. Mises, 
[1] pp. 21-27. 



4 I. Elementary Theory of Probability 

upon realization of conditions 8 belongs to the set A (defined in 
any way) , then we say that the event A has taken place. 

Example : Let the complex 3 of conditions be the tossing of a 
coin two times. The set of events mentioned in Paragraph ^con- 
sists of the fact that at each toss either a head or tail may come up. 
From this it follows that only four different variants (elementary 
events) are possible, namely: HH, HT, TH, TT. If the "event A" 
connotes the occurrence of a repetition, then it will consist of a 
happening of either of the first or fourth of the four elementary 
events. In this manner, every event may be regarded as a set of 
elementary events. 

4) Under certain conditions, which we shall not discuss here, 
we may assume that to an event A which may or may not occur 
under conditions 8, is assigned a real number P(A) which has 
the following characteristics : 

(a) One can be practically certain that if the complex of con- 
ditions 6 is repeated a large number of times, n, then if m be the 
number of occurrences of event A, the ratio m/n will differ very 
slightly from P ( A ) . 

(b) If P(A) is very small, one can be practically certain that 
when conditions @ are realized only once, the event A would not 
occur at all. 

The Empirical Deduction of the Axioms. In general, one may 
assume that the system g of the observed events A, B, C, ... to 
which are assigned definite probabilities, form a field containing 
as an element the set E (Axioms I, II, and the first part of 
III, postulating the existence of probabilities). It is clear that 
O^m/n^l so that the second part of Axiom III is quite natural. 
For the event E, m is always equal to n, so that it is natural to 
postulate ?(E) =1 (Axiom IV). If, finally, A and B are non- 
intersecting (incompatible), then m — m 1 + m 2 where m, m lt m 2 
are respectively the number of experiments in which the events 
A + B, A, and B occur. From this it follows that 

m m 1 m 2 

n n n 

It therefore seems appropriate to postulate that P(A + B) — 
P(A) + P(J5) (Axiom V). 



§ 3. Notes on Terminology 5 

Remark 1. If two separate statements are each practically 
reliable, then we may say that simultaneously they are both reli- 
able, although the degree of reliability is somewhat lowered in the 
process. If, however, the number of such statements is very large, 
then from the practical reliability of each, one cannot deduce any- 
thing about the simultaneous correctness of all of them. Therefore 
from the principle stated in (a) it does not follow that in a very 
large number of series of n tests each, in each the ratio m/n will 
differ only slightly from P(A). 

Remark 2. To an impossible event (an empty set) corre- 
sponds, in accordance with our axioms, the probability P(0) = 5 , 
but the converse is not true: P(A) =0 does not imply the im- 
possibility of A. When P(A) — 0, from principle (b) all we can 
assert is that when the conditions © are realized but once, event 
A is practically impossible. It does not at all assert, however, that 
in a sufficiently long series of tests the event A will not occur. On 
the other hand, one can deduce from the principle (a) merely that 
when P(A) = and n is very large, the ratio m/n will be very 
small (it might, for example, be equal to 1/n). 

§ 3. Notes on Terminology 

We have defined the objects of our future study, random 
events, as sets. However, in the theory of probability many set- 
theoretic concepts are designated by other terms. We shall give 
here a brief list of such concepts. 

Theory of Sets Random Events 

1. A and B do not intersect, 1. Events A and B are in- 
i.e., AB — 0. compatible. 

2. AB. . .2V~ = 0. 2. Events A, B, ... ,2V are 

incompatible. 

3. AB . . . N = X. 3. Event X is defined as the 

simultaneous occurrence of 
events A, B, . . . ,N. 

4. A 4- B + . . . + N = X. 4. Event X is defined as the 

occurrence of at least one of 
the events A,B,...,N. 



8 Cf. §4, Formula (3). 



I. Elementary Theory of Probability 



Theory of Sets 
5. The complementary set 



6. A = 0. 

7. A = E. 

8. The system 51 of the sets 
A lt A 2 , . . . , A n forms a de- 
composition of the set E if 
A 1 + A 2 + . . . + A n = E. 

(This assumes that the 
sets At do not intersect,in 
pairs.) 

9. B is a subset of A : 2? t c A. 



Random Events 

5. The opposite event A 
consisting of the non-occur- 
ence of event A. 

6. Event A is impossible. 

7. Event A must occur. 

8. Experiment % consists of 
determining which of the 
events A u A 2 , . . . , A n occurs. 
We therefore call A l9 A 2 , . . . , 
A n the possible results of ex- 
periment 51. 

9. From the occurrence of 
event B follows the inevitable 
occurrence of A. 



§ 4. Immediate Corollaries of the Axioms ; Conditional 
Probabilities ; Theorem of Bayes 

From A + A = E and the Axioms IV and V it follows that 

P(A) +P(A) =1 (1) 

P(A) =1— P(A) . (2) 

Since E = 0, then, in particular, 

P(0)=0 . (3) 

If A, B, . . . , N are incompatible, then from Axiom V follows 
the formula (the Addition Theorem) 

P(A +£+... +N)= P(A) + P(£) + ...+ P(N) 

If P(A) >0, then the quotient 

P(AB) 



(4) 



?a(B) = 



P(A) 



(5) 



is defined to be the conditional probability of the event B under 
the condition A. 

From (5) it follows immediately that 



§ 4. Immediate Corollaries of the Axioms 7 

P(AB)=P(A)P A (B) . (6) 

And by induction we obtain the general formula (the Multi- 
plication Theorem) 

P(A 1 A 2 ...A n ) = P(A l )P Al (A 2 )P AlA AA 3 )...P Al A 2 ...A n - l (A n ). (7) 

The following theorems follow easily : 

P 4 (5)g0, (8) 

P A (E) = 1, (9) 

PAB + C)=?AB)+?AC). (10) 

Comparing formulae (8) — (10) with axioms III — V, we find that 
the system $ of sets together with the set function P A (B) (pro- 
vided A is a fixed set), form a field of probability and therefore, 
all the above general theorems concerning P(B) hold true for the 
conditional probability P A (B) (provided the event A is fixed). 
It is also easy to see that 

P^(A)=1. (11) 

From (6) and the analogous formula 

P (AB)=P(B)P B (A) 
we obtain the important formula : 

PB{A) = ^m, (12) 

which contains, in essence, the Theorem of Bayes. 

The Theorem on Total Probability: Let A 1 + A 2 + . . . + 
A n — E (this assumes that the events A lf A 2J . . . , A n are mutually 
exclusive) and let X be arbitrary. Then 

P(X) = PiAJ P Al (X) + P(A 2 ) P At (X) + ... + P(A n ) P An (X).- (13) 
Proof : 

X = AiX + A 2 X + . . . + A„X; 
using (4) we have 

P(X)= P(A 1 X)+P(A 2 X) + ...+ P(A„X) 
and according to (6) we have at the same time 
P(A i X)=P(A i )P At (X). 

The Theorem of Bayes: Let A 1 + A 2 + . . . + A n = E and 
X be arbitrary, then 

p (A , PWP^X) 

x( * PiAJP^W + P(A 2 )P A ,(X) + ■■■ + P(A n )P A „(X)' (14 > 

i = 1, 2, 3,... ., ». 



8 I. Elementary Theory of Probability 

A lt A 2 , . . . , A n are often called "hypotheses" and formula 
(14) is considered as the probability P*(A { ) of the hypothesis 
Ai after the occurrence of event X. [P(A*) then denotes the 
a priori probability of A*.] 

Proof: From (12) we have 

PWP^(X) 



Px(Ai) 



P(X) 



To obtain the formula (14) it only remains to substitute for the 
probability P(X) its value derived from (13) by applying the 
theorem on total probability. 

§ 5. Independence 

The concept of mutual independence of two or more experi- 
ments holds, in a certain sense, a central position in the theory of 
probability. Indeed, as we have already seen, the theory of 
probability can be regarded from the mathematical point of view 
as a special application of the general theory of additive set func- 
tions. One naturally asks, how did it happen that the theory of 
probability developed into a large individual science possessing 
its own methods? 

In order to answer this question, we must point out the spe- 
cialization undergone by general problems in the theory of addi- 
tive set functions when they are proposed in the theory of 
probability. 

The fact that our additive set function P(A) is non-negative 
and satisfies the condition P(E) = 1, does not in itself cause new 
difficulties. Random variables (see Chap. Ill) from a mathe- 
matical point of view represent merely functions measurable with 
respect to P(A), while their mathematical expectations are 
abstract Lebesgue integrals. (This analogy was explained fully 
for the first time in the work of Frechet 6 .) The mere introduction 
of the above concepts, therefore, would not be sufficient to pro- 
duce a basis for the development of a large new theory. 

Historically, the independence of experiments and random 
variables represents the very mathematical concept that has given 
the theory of probability its peculiar stamp. The classical work 
or LaPlace, Poisson, Tchebychev, Markov, Liapounov, Mises, and 



•See Frechet [1] and [2]. 



§ 5. Independence 9 

Bernstein is actually dedicated to the fundamental investigation 
of series of independent random variables. Though the latest 
dissertations (Markov, Bernstein and others) frequently fail to 
assume complete independence, they nevertheless reveal the 
necessity of introducing analogous, weaker, conditions, in order 
to obtain sufficiently significant results (see in this chapter § 6, 
Markov chains) . 

We thus see, in the concept of independence, at least the germ 
of the peculiar type of problem in probability theory. In this 
book, however, we shall not stress that fact, for here we are 
interested mainly in the logical foundation for the specialized 
investigations of the theory of probability. 

In consequence, one of the most important problems in the 
philosophy of the natural sciences is — in addition to the well- 
known one regarding the essence of the concept of probability 
itself — to make precise the premises which would make it possible 
to regard any given real events as independent. This question, 
however, is beyond the scope of this book. 



Let us turn to the definition of independence. Given n experi- 
ments 5l (1) , 5l (2) , . . . , 5l U) , that is, n decompositions 

E = Af + A$ ] + h A 1 *} i=\,2,...,n 

of the basic set E. It is then possible to assign r = r 1 r 2 . . . r n proba- 
bilities (in the general case) 

P^... qn = P(A ( q \ ) A%;.. A { q n J)^0 
which are entirely arbitrary except for the single condition 7 that 

2 Ah<? 8 ...«» = 1 • (!) 

Definition I. n experiments 3i (1) , 5l (2) , . . . , 3l (n > are called 
mutually independent, if for any q l9 q 2 , . . . , q n the following 
equation holds true : 

p (4>4? • • • O = p «>) p (4?) • • ■ p(4:') • (2) 



7 One may construct a field of probability with arbitrary probabilities sub- 
ject only to the above-mentioned conditions, as follows: E is composed of r 
elements £«, q t . . . q n . Let the corresponding elementary probabilities be 
PqiQt...in> and finally let A q i] be the set of all £ f/l9 , tm . 9m for which 

<7t = q- 



10 I. Elementary Theory of Probability 

Among the r equations in (2), there are only r-r 1 -r 2 - . . . -r n + 
n - 1 independent equations 8 . 

Theorem I. If n experiments 9l (1 \ 5l (2) , . . . , 2i (M > are mutu- 
ally independent, then any m of them (ra< n) , 9l (t,) , $ ( **\ .... 5( ( ' m) > 
are also independent 9 . 

In the case of independence we then have the equations : 

p «4« • • • 4i B> ) = p (O p C^SW • • • p (41-*) ( g ) 

(all 4 must be different.) 

Definition II. n events A u A 2 , . . . , A n are mutually indepen- 
dent, if the decompositions (trials) 

E = A k + A k (k = l,2,...,n) 

are independent. 

In this case r x = r 2 = . . . = r n = 2, r = 2 n ; therefore, of the 2 W 
equations in (2) only 2 n -n-l are independent. The necessary 
and sufficient conditions for the independence of the events A lt A 2 , 
. . . , A n are the following 2 n - n - 1 equations 10 : 

P(A {l A i2 ...A im ) = P(A il )P(A i2 )...P(A, im ), (4) 

m — 1, 2, . . ., n, 
i^i 1 <i 2 <--<i m <n. 

All of these equations are mutually independent. 

In the case n = 2 we obtain from (4) only one condition (2 2 -2 - 



8 Actually, in the case of independence, one may choose arbitrarily only 
fi + r* 2 + . . . + t n probabilities p U) = P {A U) ) so as to comply with the n 
conditions 7 " 

<i 
Therefore, in the general case, we have r-1 degrees of freedom, but in the 
case of independence only ri + r 2 + ... + r n -n. 

9 To prove this it is sufficient to show that from the mutual independence 
of n decompositions follows the mutual independence of the first n-1. Let us 
assume that the equations (2) hold. Then 

p («. . . <-»,) =Jp («• • • <) 

Qn 

9n Q.E.D. 

10 See S. N. Bernstein [1] pp. 47-57. However, the reader can easily prove 
this himself (using mathematical induction). 



§ 5. Independence 11 

1 = 1) for the independence of two events A x and A 2 : 

?UiA 2 ) =P(A 1 )P(A 2 ). (5) 

The system of equations (2) reduces itself } in this case, to three 
equations, besides (5) : 

PiAiAz) = P(A 1 )P(A 2 ) 
?{A X A 2 ) =P(A 1 )P(A a ) 
?{A X A 2 ) =P(A 1 )P(A 2 ) , 

which obviously follow from (5). 11 

It need hardly be remarked that from the independence of 
the events A lt A 2 , . . . , A n in pairs, i.e. from the relations 

P(A«A,) =P(A i )P(A i ) «*> 

it does not at all follow that when n>2 these events are inde- 
pendent 12 . (For that we need the existence of all equations (4).) 

In introducing the concept of independence, no use was made 
of conditional probability. Our aim has been to explain as clearly 
as possible,in a purely mathematical manner, the meaning of this 
concept. Its applications, however, generally depend upon the 
properties of certain conditional probabilities. 

If we assume that all probabilities P(A g (t >) are positive, then 
from the equations (3) it follows 13 that 

P«> ... 4;;«> MM = P(4?) . (6) 

From the fact that formulas (6) hold, and from the Multiplica- 
tion Theorem (Formula (7), §4), follow the formulas (2). We 
obtain, therefore, 

Theorem II: A necessary and sufficient condition for inde- 
pendence of experiments 5l (1) , 5l (2) , . . . , 9l (w) in the case of posi- 

11 P{4iZj - P(A X ) - P{A t A 2 ) a* P{A X ) - P(A^9{A % ) = P(^){t - P(^ 2 )} 
»P(4 1 )P(i" a ) ,etc. 

12 This can be shown by the following simple example (S. N. Bernstein) : 
Let set E be composed of four elements J 1 , £ 2 , £ 3 , <£, ; the corresponding elemen- 
tary probabilities p it p 2 , p 3 , p 4 are each assumed to be X A and 

A ={^,^} r JB-Wj.W. C'^ft.W, 
It is easy to compute that 

P(A) = P(B)=P(C) ="■%, 
P(AB)=P(BC) -P(AC) = % = (V 2 ) 2 , 
P(A£C) =.14 * (V 2 ) 3 . 
u To prove it, one must keep in mind the definition of conditional proba- 
bility (Formula (5), § 4) and substitute for the probabilities of products the 
products of probabilities according to formula (3). 



12 I. Elementary Theory of Probability 

five probabilities P(A^ } ) is that the conditional probability of 
the results A q w of experiments 3t (i '> under the hypothesis that 
several other tests 2l (il) , 9l (i,) , ...,W ik) have hod definite results 
A&\AM,A i **>,...,A { £ ) is equal to the absolute probability 

On the basis of formulas (4) we can prove in an analogous 
manner the following theorem : 

Theorem III. // all probabilities P(A k ) are positive, then a 
necessary and sufficient condition for mutual independence of 
the events A lt A 2i . . . , A n is the satisfaction of the equations 

P, iA ...^(A) = PW (7) 

for any pairwise different i lt i 2 , . . . , i k , i- 

In the case n — 2 the conditions (7) reduce to two equations: 
P Al (A 2 ) = P(A 2 ) f | 
P A AA l ) = P(A 1 ). J 
It is easy to see that the first equation in (8) alone is a necessary 
and sufficient condition for the independence of A x and A 2 pro- 
vided P(A 1 ) > 0. 

§ 6. Conditional Probabilities as Random Variables, 
Markov Chains 

Let 51 be a decomposition of the fundamental set E : 

E = A* + A 2 + . . . +A r , 

and x a real function of the elementary event £ T which for every 
set A q is equal to a corresponding constant a q . x is then called a 
random variable, and the sum 

E(x) -2a Q P(A 5 ) 

Q 

is called the mathematical expectation of the variable x. The 
theory of random variables will be developed in Chaps. Ill and IV. 
We shall not limit ourselves there merely to those random vari- 
ables which can assume only a finite number of different values. 
A random variable which for every set A q assumes the value 
PA qi (B), we shall call the conditional probability of the event B 
after the given experiment % and shall designate it by P^ (B) . Two 
experiments 5l (1) and 3l (2) are independent if, and only if, 



§ 6. Conditional Probabilities as Random Variables, Markov Chains 13 

P m (A?) = P(Af) q=\,2,...,r 2 . 

Given any decompositions (experiments) 5l (1) , 5l (2) , . . . , 9l (n) , we 
we shall represent by 

2l (1 >2l (2) . . . $ ( »> 

the decomposition of set E into the products 

Experiments 3i (1 \ 2l (2) , . . . , % (n) are mutually independent when 
and only when 

p gB1 , a ,»...p. 1 ,(4») = P(4'), 

k and q being arbitrary 14 . 

Definition: The sequence 3l (1) , $ (2) , . . . , 5l (n) , . . . forms 
a Markov chain if for arbitrary n and q 

P«»>«<« ... w-«> W) = Pa(n-D(4 n) ). 

Thus, Markov chains form a natural generalization of se- 
quences of mutually independent experiments. If we set 

pQ m g n (m,n) = P A ™ (A™) m<n , 

then the basic formula of the theory of Markov chains will assume 
the form: 

pQ k q n (k> n) == *Zp qkqm (k, m) pg mqH (m, n) y k<m<n. (1) 

Qm 

If we denote the matrix \\p qmgn (nt, n)\\ by p(m, ri), (1) can be 
written as 15 : 

p(k,n) — p(k,m)p(m,n) k < m < n. (2) 



14 The necessity of these conditions follows from Theorem II, § 5 ; that they 
are also sufficient follows immediately from the Multiplication Theorem 
(Formula (7) of §4). 

16 For further development of the theory of Markov chains, see R. v. Mises 
[1], § 16, and B. Hostinsky, Methodes generates du calcul des probabilites, 
"Mem. Sci. Math." V. 52, Paris 1931. 



Chapter II 

INFINITE PROBABILITY FIELDS 

§ 1. Axiom of Continuity 

We denote by 2) A m , as is customary, the product of the sets 

m 

A m (whether finite or infinite in number) and their sum by <5A m . 

m 

Only in the case of disjoint sets A m is the form ^A m used instead 

m 

of <&A m . Consequently, 

m 

®A m = A 1 + A t + ■•■; 

ZA m = A 1 + A 2 +---, 

m 

^A m = A 1 A 2 "-. 

In all future investigations, we shall assume that besides Axioms 
I - V, still another holds true : 

VI. For a decreasing sequence of events 

A 1 z)A 2 ^-" 3^ n z>.-. (1) 

of & for which 

® A » = , (2) 

the following equation holds: 

lim P (4 n ) = . w-*oo (3) 

In the future we shall designate by probability field only a 
field of probability as outlined in the first chapter, which also 
satisfies Axiom VI. The fields of probability as defined in the first 
chapter without Axiom VI might be called generalized fields of 
probability. 

If the system J of sets is finite, Axiom VI follows from Axioms 
I - V. For actually, in that case there exist only a finite number 
of different sets in the sequence (1). Let A k be the smallest 
among them, then all sets A^ coincide with A k and we obtain then 

14 



§ 1. Axiom of Continuity 15 

n 

limP(^ B ) = P(o) = 0. 

All examples of finite fields of probability, in the first chapter, 
satisfy, therefore, Axiom VI. The system of Axioms I - VI then 
proves to be consistent and incomplete. 

For infinite fields, on the other hand, the Axiom of Continuity, 
VI, proved to be independent of Axioms I - V. Since the new axiom 
is essential for infinite fields of probability only, it is almost im- 
possible to elucidate its empirical meaning, as has been done, for 
example, in the case of Axioms I - V in § 2 of the first chapter. 
For, in describing any observable random process we can obtain 
only finite fields of probability. Infinite fields of probability occur 
only as idealized models of real random processes. We limit our- 
selves, arbitrarily, to only those models which satisfy Axiom VI. 
This limitation has been found expedient in researches of the 
most diverse sort. 

Generalized Addition Theorem : // A lt A,, . . . , A n , . . . and 
A belong to ft, then from 

A=Z A n (4) 

follows the equation 
Proof: Let 

Then, obviously ^(R n ) = 0, 

n 

and, therefore, according to Axiom VI 

lim P(R n ) = fi-»oo • (6) 

On the other hand, by the addition theorem 

P(A) = P(A 1 ) + P(A 2 ) + . . . + P(A n ) + P(R n ) . (7) 
From (6) and (7) we immediately obtain (5). 

We have shown, then, that the probability P(A) is a com- 
pletely additive set function on $. Conversely, Axioms V and VI 
hold true for every completely additive set function defined on 



n 



P{A)=2P(A n ). (5) 



n 



16 II. Infinite Probability Fields 

any field g.* We can, therefore, define the concept of a field of 
probability in the following way : Let E be an arbitrary set, % a 
field of subsets of E, containing E, and ?(A) a non-negative com- 
pletely additive set function defined on gf; the field 5 together 
with the set function ?(A) forms a field of probability. 

A Covering Theorem : // A, A lt A 2 , . . . , A n , . . . belong to g 
and 

Aa(BA n i (8) 

n 

then 

Proof: 

A = A <S(A H ) =AA t + A (A 2 - A 2 A X ) + A (A 3 - A 3 A 2 - A 3 AJ + ■ • • , 

n 

?{A) = ?(AA X ) + P{A(A 2 - A 2 A,)} + ... ^ P(^) + P(^) + •••• 

§ 2. Borel Fields of Probability 

The field 5 is called a Borel field, if all countable sums2^» 
of the sets A n from gf belong to g. Borel fields are also called com- 
pletely additive systems of sets. From the formula 

<SA n = A 1 + (A 2 - A 2 A X ) + (A 3 - A 3 A 2 - A Z A X ) + • ■ • (1) 

n 

we can deduce that a Borel field contains also all the sums <5 A n 

n 

composed of a countable number of sets A» belonging to it. From 
the formula 

%A n = E-(BA n (2) 

n n 

the same can be said for the product of sets. 

A field of probability is a Borel field of probability if the 
corresponding field % is a Borel field. Only in the case of Borel 
fields of probability do we obtain full freedom of action, without 
danger of the occurrence of events having no probability. We 
shall now prove that we may limit ourselves to the investigation 
of Borel fields of probability. This will follow from the so-called 
extension theorem, to which we shall now turn. 

Given a field of probability (5, P). As is known 1 , there exists 
a smallest Borel field B^ containing 5- And we have the 



* See, for example, O. Nikodym, Sur une generalisation des integrates de 
M. J. Radon, Fund. Math. v. 15, 1930, p. 136. 
1 Hausdorff, Mengenlehre, 1927, p. 85. 



§ 2. Borel Fields of Probability 17 

Extension Theorem : It is always possible to extend a non- 
negative completely additive set function P(A), defined in %, 
to all sets of B% without losing either of its properties (non- 
negativeness and complete additivity) and this can be done in 
only one way. 

The extended field B% forms with the extended set func- 
tion P(A) a field of probability (B%, P). This field of probability 
(B%, P) we shall call the Borel extension of the field ($, P). 

The proof of this theorem, which belongs to the theory of 
additive set functions and which sometimes appears in other 
forms, can be given as follows: 

Let A be any subset of E ; we shall denote by P* (A) the lower 
limit of the sums 

y:p(A n ) 

n 

for all coverings 

Acz(SA n 

n 

of the set A by a finite or countable number of sets A„ of $• It is 
easy to prove that P*(A) is then an outer measure in the 
Caratheodory sense 2 . In accordance with the Covering Theorem 
(51), P*(A) coincides with P(A) for all sets of 8f. It can be fur- 
ther shown that all sets of $ are measurable in the Caratheodory 
sense. Since all measurable sets form a Borel field, all sets of B% 
are consequently measurable. The set function P*(A) is, there- 
fore, completely additive on B%, and on B% we may set 

P(A) = P*(A). 

We have thus shown the existence of the extension. The unique- 
ness of this extension follows immediately from the minimal 
property of the field B%. 

Remark: Even if the sets (events) A of 5 can be interpreted 
as actual and (perhaps only approximately) observable events, 
it does not, of course, follow from this that the sets of the extended 
field B% reasonably admit of such an interpretation. 

Thus there is the possibility that while a field of probability 
(5, P) may be regarded as the image (idealized, however) of 



2 Caratheodory, Vorlesungen iiber reelle Funktionen, pp. 237-258. (New- 
York, Chelsea Publishing Company) . 



18 II. Infinite Probability Fields 

actual random events, the extended field of probability (B%, P) 
will still remain merely a mathematical structure. 

Thus sets of B% are generally merely ideal events to which 
nothing corresponds in the outside world. However, if reasoning 
which utilizes the probabilities of such ideal events leads us to a 
determination of the probability of an actual event of g, then, 
from an empirical point of view also, this determination will 
automatically fail to be contradictory. 

§ 3. Examples of Infinite Fields of Probability 

I. In § 1 of the first chapter, we have constructed various 
finite probability fields. 

Let now E = {£ x , £ 2 > • • •> ln» • ■ •} be a countable set, and let 5 
coincide with the aggregate of the subsets of E. 

All possible probability fields with such an aggregate 5 are 
obtained in the following manner: 

We take a sequence of non-negative numbers p„, such that 

Pi + Vi + . . . + Vn + • • • = 1 

and for each set A put 

P(A) - 2'fin, 
n 

where the summation 2' extends to all the indices n for which 
$ n belongs to A. These fields of probability are obviously Borel 
fields. 

II. In this example, we shall assume that E represents the 
real number axis. At first, let g be formed of all possible finite 
sums of half -open intervals [a; b) — {a£.tj<b} (taking into 
consideration not only the proper intervals, with finite a and b, 
but also the improper intervals [- <x> ; a), [a,- + oo) and [-o©j 
4- oo ) ) . g is then a field. By means of the extension theorem, how- 
ever, each field of probability on 5 can be extended to a similar 
field on B%. The system of sets B% is, therefore, in our case 
nothing but the system of all Borel point sets on a line. Let us 
turn now to the following case. 

III. Again suppose E to be the real number axis, while g is 
composed of all Borel point sets of this line. In order to construct 
a field of probability with the given field gf, it is sufficient to 
define an arbitrary non-negative completely additive set-function 



§ 3. Examples of Infinite Fields of Probability 19 

P(A) on 3 which satisfies the condition P(E) = 1. As is well 
known 3 , such a function is uniquely determined by its values 

P[-oo;x) =F(x) (1) 

for the special intervals [-<*>; x) . The function F(x) is called the 
distribution function of £. Further on (Chap. Ill, § 2) we shall 
shown that F(x) is non-decreasing, continuous on the left, and 
has the following limiting values : 

lim F(x) = i^-oc) = 6, lim F(x) = F( + oo) = 1 . (2) 

* — ► — oo a; -> -»- oo 

Conversely, if a given function F(x) satisfies these conditions, 
then it always determines a non-negative completely additive set- 
function P(A) for which P(E) = l 4 . 

IV. Let us now consider the basic set E as an n-dimensional 
Euclidian space R n , i.e., the set of all ordered n-tuples £ = {x u x 2 , 
. . . , x n j of real numbers. Let $ consist, in this case, of all Borel 
point-sets 5 of the space R n . On the basis of reasoning analogous 
to that used in Example II, we need not investigate narrower sys- 
tems of sets, for example the systems of n-dimensional intervals. 

The role of probability function P(A) will be played here, 
as always, by any non-negative and completely additive set- 
function defined on $ and satisfying the condition P(E) =1. Such 
a set-function is determined uniquely if we assign its values 

P{L aiai ...an) =F{a lt a 2 ,...,a n ) (3) 

for the special sets L aia% „, an , where L aia ,... an represents the 
aggregate of all £ for which Xi<Oi (i = 1, 2, . . . , n). 

For our function F (a lf a 2 , . . . , a n ) we may choose any function 
which for each variable is non-decreasing and continuous on the 
left, and which satisfies the following conditions : 

lim F(a v a 2> ...,«„) = F(a v . . .,«i_i, — oo,a i+1 , ...,#„) =0, 
"—~ t = 4, 2i ....,» f 

lim F(a v a 2 ,.. .,a n ) =F(+oo, +00, ..., -foo) = 1. 

Oi -> +00, Oj -> +00, ..., o» — ► -t-00 

F(a u a 2 , . . . , a n ) is called the distribution function of the vari- 
ables a?i, x 2 , . . . , x n . 



3 Cf ., for example, Lebesgue, Legons sur V integration, 1928, p. 152-156. 
* See the previous note. 

8 For a definition of Borel sets in R n see Hausdorff, Mengenlehre, 1927, 
pp. 177-181. 



20 II. Infinite Probability Fields 

The investigation of fields of probability of the above type 
is sufficient for all classical problems in the theory of probability 6 . 
In particular, a probability function in R n can be defined thus: 
We take any non-negative point function f(x u x 2 , . . . , x n ) 
defined in R n , such that 
+00 +00 +90 
j j ...j f(x lt x 2 , . . .,x n )dx 1 dx 2 . . . dx n =\ 



—00 —00 



and set 

P ( A ) = //••• ff( x i> x 2> •• .,x n )dx 1 dx 2 ... dx n . (5) 

A 

f(x u x 2 , . . . , x n ) is, in this case, the probability density at the 
point (x u x 2 , . . . , x n ) (cf. Chap. Ill, § 2). 

Another type of probability function in R n is obtained in the 
following manner: Let {£.} be a sequence of points of R n , and 
let {pi} be a sequence of non-negative real numbers, such that 
£pi = 1 ; we then set, as we did in Example I, 

P(A) =Z'Vi, 

where the summation 2' extends over all indices i for which £ 
belongs to A. The two types of probability functions in R n men- 
tioned here do not exhaust all possibilities, but are usually con- 
sidered sufficient for applications of the theory of probability. 
Nevertheless, we can imagine problems of interest for applica- 
tions outside of this classical region in which elementary events 
are defined by means of an infinite number of coordinates. The 
corresponding fields of probability we shall study more closely 
after introducing several concepts needed for this purpose. (Cf. 
Chap. Ill, §3). 



6 Cf., for example, R. v. Mises [1], pp. 13-19. Here the existence of proba- 
bilities for "all practically possible" sets of an n-dimensional space is 
required. 



Chapter III 



RANDOM VARIABLES 

§ 1. Probability Functions 

Given a mapping of the set E into a set E' consisting of any- 
type of elements, i.e., a single-valued function u(£) defined on E, 
whose values belong to E'. To each subset A' of E' we shall put 
into correspondence, as its pre-image in E, the set u- x (A') of all 
elements of E which map onto elements of A'. Let % (u) be the 
system of all subsets A' of E', whose pre-images belong to the 
field g. % (u) will then also be a field. If 5 happens to be a Borel 
field, the same will be true of 5 (m) - We now set 

poo(A') = P K 1 ^')}. (1) 

Since this set-function P (m) , defined on 5 (M \ satisfies with respect 
to the field 5 (m) all of our Axioms I - VI, it represents a proba- 
bility function on % (u) . Before turning to the proof of all the facts 
just stated, we shall formulate the following definition. 

Definition. Given a single- valued function u(£) of a random 
event £. The function P (M >(A'), defined by (1), is then called the 
probability function of u. 

Remark 1 : In studying fields of probability (5, P) , we call the 
function P(A) simply the probability function, but P^(A') is 
called the probability function of u. In the case u($) = £, P (m) (A') 
coincides with P(A). 

Remark 2: The event vr x (A') consists of the fact that u(£) 
belongs to A'. Therefore, P (m) (A') is the probability of u(£) c A'. 

We still have to prove the above-mentioned properties of % (u) 
and P (M >. They follow, however, from a single fact, namely: 

Lemma. The sum, product, and difference of any pre-image 
sets w -1 (A') are the pre-images of the corresponding sums, prod- 
ucts, and differences of the original sets A'. 

The proof of this lemma is left for the reader. 

21 



22 III. Random Variables 

Let A' and B' be two sets of $ (M >. Their pre-images A and B 
belong then to J. Since % is a field, the sets AB, A + B, and A - B 
also belong to g ; but these sets are the pre-images of the sets A'B\ 
A' + B\ and A' -B', which thus belong to ^ u \ This proves that 
5 (u) is a field. In the same manner it can be shown that if g is a 
Borel field, so is % (u \ 

Furthermore, it is clear that 

PM(E') = P^- 1 ^)} = P(#) = 1. 
That P U) is always non-negative, is self-evident. It remains only 
to be shown, therefore, that P (m) is completely additive (cf. the 
end of § 1, Chap. II). 

Let us assume that the sets A' n , and therefore their pre-images 
u- 1 (A\) } a,Ye disjoint. It follows that 

n n n 

n n 

which proves the complete additivity of P u) . 

In conclusion let us also note the following. Let u x (g) be a 
function mapping E on E', and u 2 (t) be another function, map- 
ping £" on E". The product function u 2 uA£) maps E on E" . We 
shall now study the probability functions P (Ml) (A') and P (u HA") 
for the functions u r U) and u(() = UzUiU). It is easy to show 
that these two probability functions are connected by the follow- 
ing relation: 

?^(A ,f )^?^){u^(A ff )}. (2) 

§ 2. Definition of Random Variables and of 
Distribution Functions 

Definition. A real single- valued function *(£), defined on the 
basic set E, is called a random variable if for each choice of a real 
number a the set {x < a} of all | for which the inequality x < a 
holds true, belongs to the system of sets $• 

This function x(£) maps the basic set E into the set R 1 of all 
real numbers. This function determines, as in § 1, a field % (x) of 
subsets of the set R 1 . We may formulate our definition of random 
variable in this manner : A real function x (£) is a random variable 
if and only if g U) contains every interval of the form (-ooj a) . 



§ 2. Definition of Random Variables and of Distribution Functions 23 

Since g ( *> is a field, then along with the intervals (-oo,« a) it 
contains all possible finite sums of half -open intervals [a,- b). If 
our field of probability is a Borel field, then $ and 5 U) are Borel 
fields ; therefore, in this case % (x) contains all Borel sets of R 1 , 

The probability function of a random variable we shall denote 
in the future by P<*> (A') . It is defined for all sets of the field ft<*>. 
In particular, for the most important case, the Borel field of 
probability, P (x) is defined for all Borel sets of R 1 . 

Definition. The function 

F<*Ha) =P<*> (-*>', a) =p {x<a}, 

where - oo and 4- oo are allowable values of a, is called the distri- 
bution function of the random variable x. 
From the definition it follows at once that 

FW(-oo) =0, FW( + oo) = 1 . (1) 

The probability of the realization of both inequalities a^x<b, 
is obviously given by the formula 

?{x c [a; b)} = F&{b) - F&(a) (2) 

From this, we have, for a < b, 

FW(a)§FW(5) 

which means that F (x) (a) is a non-decreasing function. Now let 
fli < a 2 < . . . < a n < . . . < b ; then 

^{xa[a n ;b)} = 

n 

Therefore, in accordance with the continuity axiom, 

FV(b)-F(*)(a n ) = P{xcz[a n> b)} 

approaches zero as«-> + oo. From this it is clear that F (x) (a) is 
continuous on the left. 

In an analogous way we can prove the formulae: 

lim FW (a) = FW (.- oo ) = 0, a -+ - oo , (3) 

lim FW (a) = F« ( + oo ) = 1, a -► + oo- (4) 

If the field of probability (5, P) is a Borel field, the values of 
the probability function P<*>(A) for all Borel sets A of i^ 1 are 
uniquely determined by knowledge of the distribution function 



24 III. Random Variables 

F (x) (a) (cf. § 3, III in Chap. II). Since our main interest lies in 
these values of P (x) (A), the distribution function plays a most 
significant role in all our future work. 

If the distribution function F (x) (a) is differentiate, then we 
call its derivative with respect to a, 

the probability density of x at the point a. 

a 

If also F (x) (a) = j f ix) (a) da for each a, then we may ex- 

— oo 

press the probability function ? (x) (A) for each Borel set A in 
terms of f (x) (a) in the following manner: 

Pto(A)=ff(*){a)da. (5) 

A 

In this case we call the distribution of x continuous. And in the 
general case, we write, analogously 

PW(A)- = fdFW\a). (6) 

A 

All the concepts just introduced are capable of generalization 
for conditional probabilities. The set function 

9%\A) = ? B (xc:A) 

is the conditional probability function of x under hypothesis B. 
The non-decreasing function 

Ff(a) = P B (x<a) 

is the corresponding distribution function, and, finally (in the 
case where F^(a) is differentiate ) 

*?(*) = j;*VM 

is the conditional probability density of x at the point a under 
hypothesis B. 

§ 3. Multi-dimensional Distribution Functions 

Let now n random variables x lt x 2 , . . . , x n be given. The point 
x = (x u x 2 , . . . , Xn) of the 7i-dimensional space R n is a function 
of the elementary event £. Therefore, according to the general 
rules in §1, we have a field «j(*i; *.■••.*■> consisting of 



§ 3. Multi-dimensional Distribution Functions 25 

subsets of space R n and a probability function pfe»» *»•••»■•*> (4') 
defined on gf'. This probability function is called the n-dimensional 
probability function of the random va t iables x lt x 2 , . . . , x n . 

As follows directly from the definition of a random variable, 
the field g' contains, for each choice of i and a t (i = 1, 2, . . . , n) f 
the set of all points in R n for which x { < a { . Therefore g' also con- 
tains the intersection of the above sets, i.e. the set L ai0t _ aH 
of all points of R n for which all the inequalities x { < a t hold 
(i = l,2,...,n)\ 

If we now denote as the n-dimensional half -open interval 

[tti, a 2 , . . . , a n ', Oi, b 2 , . . . , o n ) ; 

the set of all points in R n , for which a i ^^ i <b i , then we see at 
once that each such interval belongs to the field gf' since 

[a v a t , ...,a n ; b v b 2 , . . ., b n ) 

== ^b\ bt .. . b n *^o,\ b t . . . b n ^b\ a t bi ... bn * * ^b x b% ... bn-i dn ' 

The Borel extension of the system of all n-dimensional half- 
open intervals consists of all Borel sets in R n . From this it follows 
that in the case of a Borel field of probability ' 7 the field 5 contains 
all the Borel sets in the space R n . 

THEOREM : In the case of a Borel field of probability each Borel 
function x = f(x lt x 2 , . . . , x n ) of a finite number of random vari- 
ables x u x 2 , . . . , x n is also a random variable. 

All we need to prove this is to point out that the set of all 
points (x lt x 2 , . . . , x n ) in R n for which x = f(x u %2, . . . , x n ) <a, 
is a Borel set. In particular, all finite sums and products of random 
variables are also random variables. 

Definition : The function 

is called the w-dimensional distribution function of the random 
variables x lf x 2f . . . , x n . 

As in the one-dimensional case, we prove that the n-dimensional 
distribution function F (Xl ' x Xn) (a u a 2f . . . , a n ) is non-decreas- 
ing and continuous on the left in each variable. In analogy to 
equations (3) and (4) in § 2, we here have 



1 The a f may also assume the infinite values ± <*> 



26 III. Random Variables 



limF(« lf a 2 , . . ., a n ) = F(a v . . ., «,_ lf -oo, a i+1 , . . ., a n ) = 0, (7) 
limyfo, a,, . . ., a n ) = F(+<x>, +<x>, . . ., + oo) = 1. (8) 

O, — ► + 00, a t — ► +oo. . .., a M -> +oo 

The distribution function F< x * x * ■• ■*•») gives directly the values 
of P (Xl ' * 2 Xh) only for the special sets L fli a , . . . a „ . If our field, how- 
ever, is a Borel field, then 2 ?<*"* >*») is uniquely determined for 

all Borel sets in R n by knowledge of the distribution function 

If there exists the derivative 

we call this derivative the n-dimensional probability density of 
the random variables x u x 2 , . . . , x n at the point a u a 2r . . , a„. If 
also for every point (a 11 a 2 , . . . , a n ) 

p(xux*. ...,*„> (a x a 2 . . . an ) = | f ...jf{a lt a 2 a n )da,da 2 . . . da n , 

— OO — oo — oo 

then the distribution of x lf x 2 , . . . , se» is called continuous. For 
every Borel set Ac # M , we have the equality 

pfeu.... ..,«.) (4) -=yj. . .jf(a v a %t . . ., flji^rffl, . • • <**„. (9) 

4 

In closing this section we shall make one more remark about 
the relationships between the various probability functions and 
distribution functions. 

Given the substitution 



s /i. 2, .... n\ 



and let ^denote the transformation 

*i = x ik (k = 1,2, ...,n) 
of space i? w into itself. It is then obvious that 

pfrv*^. ••-,*»,) (4) = p(*i, *.,..., «w {r-i^)}. (10) 

Now let x' = Pk(x) be the "projection" of the space R n on the 
space R k (k<n), so that the point (x lf x 2 , . . . , x n ) is mappedonto 
the point (x u x 2t . . . , ^ fc ) . Then, as a result of Formula (2) in § 1, 



Cf . § 3, IV in the Second Chapter. 



§ 4. Probabilities in Infinite-dimensional Spaces 27 

p<*.,* a ,...,**>(,4) = pttk.*.....-^^-!^)}. (ii) 

For the corresponding distribution functions, we obtain from 
(10) and (11) the equations : 

/#*.•*«.• —"Ufo, a ia , . . ., a in ) = F<*»**< ••->^(a 1 ,a 2 a n ) , (12) 

pin,**. ...,**) (a lf a 2t . ..,a k ) = F x » •«•■•••*«> (a x , ...,a ft ,+oo,...,+oo).(13) 

§ 4. Probabilities in Infinite-dimensional Spaces 

In § 3 of the second chapter we have seen how to construct 
various fields of probability common in the theory of probability. 
We can imagine, however, interesting problems in which the 
elementary events are defined by means of an infinite number 
of coordinates. Let us take a set M of indices /* (indexing set) of 
arbitrary cardinality m . The totality of all systems 

of real numbers x M , where /x runs through the entire set M, we 
shall call the space R M (in order to define an element £ in space 
R M , we must put each element /x in set M in correspondence with 
a real number % or, equivalently, assign a real single-valued 
function x^ of the element /*, defined on M) 3 . If the set M consists 
of the first n natural numbers 1, 2, . . . , n, then R M is the ordinary 
7i-dimensional space R n . If we choose for the set M all real num- 
bers R 1 , then the corresponding space R M = R R1 will consist of 
all real functions 

((/*) = x tt 

of the real variable /*. 

We now take the set R M (with an arbitrary set M) as the 
basic set E. Let I = {x^} be an element in E; we shall denote by 
ft* a... >»:(£) ^ ne Point {x /tl ,x iH9 ..-. t x fh )' of the n-dimensional 
space R n . A subset A of E we shall call a cylinder set if it can 
be represented in the form 

where A' is a subset of # w . The class of all cylinder sets coincides, 
therefore, with the class of all sets which can be defined by rela- 
tions of the form 



3 Cf. Hausdorff, Mengenlehre, 1927, p. 23. 



28 III. Random Variables 

/(**.**.- ••»**,)=-<) . (1) 

In order to determine an arbitrary cylinder set P Ml ^ . . . ^ (A ') by 
such a relation, we need only take as / a function which equals 
on A', but outside of A' equals unity. 

A cylinder set is a Borel cylinder set if the corresponding set 
A f is a Borel set. All Borel cylinder sets of the space R M form a 
field, which we shall henceforth denote by g M4 . 

The Borel extension of the field % M we shall denote, as always, 
by B% M . Sets in B% M we shall call Borel sets of the space R M . 

Later on we shall give a method of constructing and operating 
with probability functions on % M , and consequently, by means of 
the Extension Theorem, on B% M also. We obtain in this manner 
fields of probability sufficient for all purposes in the case that the 
set M is denumerable. We can therefore handle all questions 
touching upon a denumerable sequence of random variables. But 
if M is not denumerable, many simple and interesting subsets of 
R M remain outside of B% M . For example, the set of all elements £ 
for which * M remains smaller than a fixed constant for all 
indices /*, does not belong to the system B% M if the set M is 
non-denumerable. 

It is therefore desirable to try whenever possible to put each 
problem in such a form that the space of all elementary events £ 
has only a denumerable set of coordinates. 

Let a probability function P(A) be defined on % M . We may 
then regard every coordinate % M of the elementary event £ 
as a random variable. In consequence, every finite group 
( x rii> x m»> - • •* x fJ °f these coordinates has an ^-dimensional 
probability function P^....^^) and a corresponding distribu- 



4 From the above it follows that Borel cylinder sets are Borel sets definable 
by relations of type ( 1 ) . Now let A and B be two Borel cylinder sets defined 
by the relations 

/(*/*i. *t* t *#«J = 0» Sfai. *l X U) = • 

Then we can define the sets A + B, AB, and A-B respectively by the relations 

f-g = 0, 
f* + g 2 = 0, 

where a> (x) = f or x 4= and w (0) = 1 If / and g are Borel functions, so 
also are f-g, f + g 2 and f + <o{g) ; therefore, A + B, AB and A-B are Borel 
cylinder sets. Thus we have shown that the system of sets $ 3f is a field. 



§ 4. Probabilities in Infinite-dimensional Spaces 29 

tion function ^^...^(fli, a 2 , . . . , a w ). It is obvious that for 
every Borel cylinder set 

the following equation holds: 

p^ = p w ,...,.w, 

where A' is a Borel set of /?". In this manner, the probability 
function P is uniquely determined on the field % M of all cylinder sets 
by means of the values of all finite probability functions P^^ . . . ^ 
for all Borel sets of the corresponding spaces R n . However, for 
Borel sets, the values of the probability functions P^,...^ are 
uniquely determined by means of the corresponding distribution 
functions. We have thus proved the following theorem : 

T.he set of all finite-dimensional distribution functions 
F/hih — i 1 * uniquely determines the probability function P(A) for 
all sets in $ M . If P(A) is defined on % M , then (according to the 
extension theorem) it is uniquely determined on B% M by the 
values of the distribution f unctions F^^...^ . 

We may now ask the following. Under what conditions does a 
system of distribution functions F^^,,.^ given a priori define 
a field of probability on % M (and, consequently, on B% M ) ? 

We must first note that every distribution function F^/h.../** 
must satisfy the conditions given in § 3, III of the second chap- 
ter; indeed this is contained in the very concept of distribution 
function. Besides, as a result of formulas (13) and (14) in §2, 
we have also the following relations : 

F fHifHt ... Hn {a il , a it , . . ., a in ) = F /<l/<2 ... /ttt K, a 2 , . . ., a n ) , (2) 

*V*...**(«i. a 2 > -■■> a k) =^W,...^K, « 2 . ...,**,+<»,..., +oo),(3) 
where k < n and [/ / "' n ) is an arbitrary permutation. 

\*1> *2» • • •» W 

These necessary conditions prove also to be sufficient, as will 
appear from the following theorem. 

Fundamental Theorem: Every system of distribution func- 
tions F fll H M ...p H , satisfying the conditions (2) and (3), defines a 
probability function P(A) on % M , which satisfies Axioms I - VI. 
This probability function P(A) can be extended (by the exten- 
sion theorem) to B% M also. 



30 III. Random Variables 

Proof. Given the distribution functions ^ 1/ u t ... / . B , satisfying 
the general conditions of Chap. II, § 3, III and also conditions (2) 
and (3). Every distribution function &&&... p. defines uniquely 
a corresponding probability function P^^,...^ for all Borel sets 
of R n (cf. § 3). We shall deal in the future only with Borel sets 
of R n and with Borel cylinder sets in E. 

For every cylinder set 

we set 

PW = P*,*,...,^ V ). (4) 

Since the same cylinder set A can be denned by various sets A', 
we must first show that formula (4) yields always the same 
value for P(A). 

Let (x^, x^ ..., XpJ be a finite system of random variables 
Xp. Proceeding from the probability function P^^,...^ of these 
random variables, we can, in accordance with the rules in § 3, 
define the probability function P^^...^ of each subsystem 
(x Hi , x H , . . ., x /H ) . From equations (2) and (3) it follows that 
this probability function defined according to § 3 is the same as 
the function P^^ 2 . . . Hlt given a priori. We shall now suppose that 
the cylinder set A is defined by means of 

A=p;l„ it ... H y) 

and simultaneously by means of 

where all random variables x M and * belong to the system 
( x /*i > x ht > • • • » *«J » which is obviously not an essential restriction. 
The conditions 

and 

(V , V , ...,*« )cA" 

are equivalent. Therefore 

P ^\ H % • • • H k ( A ') = P ^« ■ • • n* {(^ » */4, • * ' ' > X H k ) c ^'} 

= P^,...^{(*>V X' ' • " **J c A l = %^'^J A ^ > 
which proves our statement concerning the uniqueness of the 
definition of P(A). 



§ 4. Probabilities in Infinite-dimensional Spaces 31 

Let us now prove that the field of probability (JP, P) satisfies 
all the Axioms I - VI. Axiom I requires merely that g M be a field. 
This fact has already been proven above. Moreover, for an arbi- 
trary /x : 

P(E) = P fl (R*) = i, 

which proves that Axioms II and IV apply in this case. Finally, 
from the definition of P(A) it follows at once that P(A) is non- 
negative (Axiom III). 

It is only slightly more complicated to prove that Axiom V 
is also satisfied. In order to do so, we investigate two cylinder sets 

and B -«iV-*.<*>. 

We shall assume that all variables x h . and x N belong to one inclu- 
sive finite system (x^, x^, . . ., x„ n ) . If the sets A and B do not 
intersect, the relations 

[*/%'*/%' -'" x /H k ) (=:A 

are incompatible. Therefore 

?{A + B) = P**.;.*^, x Hi , . . ., *„.J c: A' 
or (VS'-'SJ^J 

= P^, fi2 . • • ftn { (^i 1 » ^i, » * ' * ' **fe) C ^ } 

+ P^^...^{(^. , *„ v • • ., *„,J c B'} = P(^) + P(B) , 

which concludes our proof. 
Only Axiom VI remains. Let 

A 1 => A 2 3 ••• id i4 w z> ••• 

be a decreasing sequence of cylinder sets satisfying the condition 

lim P(A n ) =L>0. • 

We shall prove that the product of all sets A n is not empty. We 
may assume, without essentially restricting the problem, that in 
the definition of the first n cylinder sets A k , only the first n co- 
ordinates Xp k in the sequence 



32 III. Random Variables 

occur, i.e. 

^ = ^,. ..,.»(£»)• 

For brevity we set 

^, t ...Mn(B) = P n (B); 
then, obviously 

P n (B n ) =?(A n ) ^L>0. 

In each set B n it is possible to find a closed bounded set U n such 
that 

P»(B n -U n )^-^. 

From this inequality we have for the set 

the inequality 
Let, morever, 



" r 1*1 ft • • • f*H V " 



P(A n -V n )^J-. (5) 



w n = v x v 2 . . . v n . 

From (5) it follows that 

P(A n -W n ) g € . 

Since W n cV n c:A n , it follows that 

P(W n )^P(A n )-e^L-8. 

If e is sufficiently small, P(W n ) > and W n is not empty. We 
shall now choose in each set W n a point £ U) with the coordinates 
a» Every point ^ M +^), p = 0, 1, 2, . . . , belongs to the set V n ; 
therefore 

(*r p) . *;r p) *< n . + ») = ^....,.(f<»^») c t/„ . 

Since the sets U n are bounded we may (by the diagonal method) 
choose from the sequence {£ (n) } a subsequence 

for which the corresponding coordinates *2? tend for any A: to 
a definite limit x k . Let, finally, | be a point in set £7 with the 
coordinates 

X t*k = x k > 

x,* = 0, /* + /**• £ = 1,2,3,... 



§ 5. Equivalent Random Variables; Various Kinds of Convergence 33 

As the limit of the sequence (x^, 4 Wl) , • . • , #i Wi) ), i = 1, 2, 3, . . . , the 
point (x lt x 2 , . . . , £fc) belongs to the set U k . Therefore, £ belongs to 

for any k and therefore to the product 

k * 

§ 5. Equivalent Random Variables ; Various Kinds of Convergence 

Starting with this paragraph, we deal exclusively with Borel 
fields of probability. As we have already explained in § 2 of the 
second chapter, this does not constitute any essential restriction 
on our investigations. 

Two random variables x and y are called equivalent, if the 
probability of the relation x ^=-y is equal to zero. It is obvious that 
two equivalent random variables have the same probability func- 
tion: 

pu)(A) = ?(y)(A). 

Therefore, the distribution functions F^ and F-W are also 
identical. In many problems in the theory of probability we may 
substitute for any random variable any equivalent variable. 
Now let 

X\, X%, . . . , X n , ... \L) 

be a sequence of random variables. Let us study the set A of all 
elementary events £ for which the sequence (1) converges. If we 
denote by A ( ™J the sets of £ for which all the following inequalities 
hold 

K+*-*»| <^ k = \,2, ...,p 

then we obtain at once 

A = $<§3Mj; . (2) 

m n p 

According to § 3, the set A^ always belongs to the field gf. The 
relation (2) shows that A, too, belongs to 5- We may, therefore, 
speak of the probability of convergence of a sequence of random 
variables, for it always has a perfectly definite meaning. 

Now let the probability P(A) of the convergence set A be 
equal to unity. We may then state that the sequence (1) con- 
verges with the probability one to a random variable x, where 



34 III. Random Variables 

the random variable x is uniquely denned except for equivalence. 
To determine such a random variable we set 



lim x n n 



oo 



on A, and x — outside of A. We have to show that x is a random 
variable, in other words, that the set A (a) of the elements £ for 
which x < a, belongs to 5- But 

A(a) = A<S<£>{x n+p <a} 
in case a ^ 0,and 

A (a) = ,4©${* n+p <tf} + ^" 

n p 

in the opposite case, from which our statement follows at once. 
If the probability of convergence of the sequence (1) to x 
equals one, then we say that the sequence (1) converges almost 
surely to x. However, for the theory of probability, another con- 
ception of convergence is possibly more important. 

Definition. The sequence x u x 2 , . . . , x n , . . '.'. of random vari- 
ables converges in probability (converge en probability) to the 
random variable x, if for any £ > 0, the probability 

tends toward zero as n — ► oo 5 . 

I. If the sequence (1) converges in probability to x and also 
to x', then x and x' are equivalent. In fact 

since the last probabilities are as small as we please for a suffici- 
ently large n it follows that 

p |i*-*'i>y=° 

and we obtain at once that 

P{x± X '}^]?P{\x- X '\>l t } = 0. 

m 

II. // the sequence (1) almost surely converges to x, then it 



5 This concept is due to Bernoulli ; its completely general treatment was 
introduced by E. E. Slutsky (see [1]). 



§ 5. Equivalent Random Variables; Various Kinds of Convergence 35 

also converges to x in probability. Let A be the convergence set 
of the sequence (1) ; then 

1 = P(A)^limP{\x n+p -x\<e,p = 0,i,2,...}^limP{\x n -x\<e}, 

from which the convergence in probability follows. 

III. For the convergence in probability of the sequence (1) 
the following condition is both necessary and sufficient: For any 
£ > there exists an n such that, for every p > 0, the following 
inequality holds: 

P {|*n+p-*n|>£}<£ . 

Let F x (a), F s (a), . . . , F n (a), . . . , F(a) be the distribution 
functions of the random variables x lt %2, ...,£«,...-, x. If the 
sequence x n converges in probability to x, the distribution func- 
tion F(a) is uniquely determined by knowledge of the functions 
F n (a). We have, in fact, 

THEOREM : // the sequence x lt x 2 , . . . , x n , . . . converges in 
probability to x, the corresponding sequence of distribution func- 
tions F n (a) converges at each point of continuity of F(a) to the 
distribution function F(a) of x. 

That F(a) is really determined by the F n (a) follows from the 

fact that F (a) , being a monotone function, continuous on the left, 

is uniquely determined by its values at the points of continuity 6 . To 

prove the theorem we assume that F is continuous at the point 

a. Let a' < a ; then in case x < a', x n ==^a it is necessary that 

\ x n -x \ > a - a'. Therefore 

lim P (x < a, x n ^ a) = , 

F(a') = P{x<a')^P{x n <a) + P(x<a\x n ^a)=F n (a) + P{x<a',x n ^a), 

F (a') ^ lim inf F n (a) + lim P (x < a, x n ^ a) , 

F(a')^\immiF n (a). (3) 

In an analogous manner, we can prove that from a" > a there 
follows the relation 

F(a") ^limsupF c (a). (4) 



8 In fact, it has at most only a countable set of discontinuities (see Lebesgue, 
Legons sur V integration, 1928, p. 50. Therefore, the points of continuity are 
everywhere dense, and the value of the function F(a) at a point of discon- 
tinuity is determined as the limit of its values at the points of continuity 
on its left. 



36 III. Random Variables 

Since F(a') and F(a") converge to F(a) for a' — * a and a" — ► a, 
it follows from (3) and (4) that 

limF B (a) = F(a), 

which proves our theorem. 



Chapter IV 



MATHEMATICAL EXPECTATIONS 1 

§ 1. Abstract Lebesgue Integrals 

Let # be a random variable and A a set of gf. Let us form, for a 
positive A, the sum 

k= +00 

S;. ^^H?{kk^f< {k+i)X t (cA}. (1) 

* = -00 

If this series converges absolutely for every A, then as A — ► 0, S k 
tends toward a definite limit, which is by definition the integral 



I- 



xP(dE) . (2) 

A 

In this abstract form the concept of an integral was introduced 
by Frechet 2 ; it is indispensable for the theory of probability. 
(The reader will see in the following paragraphs that the usual 
definition for the conditional mathematical expectation of the 
variable x under hypothesis A coincides with the definition of 
the integral (2) except for a constant factor.) 

We shall give here a brief survey of the most important 
properties of the integrals of form (2) . The reader will find their 
proofs in every textbook on real variables, although the proofs 
are usually carried out only in the case where P(A) is the Lebesgue 
measure of sets in R n . The extension of these proofs to the general 
case does not entail any new mathematical problem ; for the most 
part they remain word for word the same. 

I. If a random variable x is integrable on A, then it is in- 
tegrate on each subset A' of A belonging to g. 

II. If x is integrable on A and A is decomposed into no 



1 As was stated in § 5 of the third chapter, we are considering in this, as well 
as in the following chapters, Borel fields of probability only. 

2 Frechet, Sur Vintegrale oVune functionnelle etendue a un ensemble 
abstrait, Bull. Soc. Math. France v. 43, 1915, p. 248. 

37 



38 IV. Mathematical Expectations 

more than a countable number of non-intersecting sets A n of gf, 

then r _ , 

JxPXdE)=£jxP(dE). 

A n An 

III. If x is integrable r | a; | is also integrable, and in that case 

\jxP(dE)\^j\x\P{dE), 

A A 

IV. If in each event |, the inequalities ^ y s^ x hold, then 
along with x, y is also integrable 3 , and in that case 



JyP(dE) ^fxP{dE) 



A A 

V. If m ^ as g M where m and M are two constants, then 

m P (A) ^jx P (dE) ^ M P {A) . 

VI. If £ and y are integrable, and K and L are two real con- 
stants, then Kx + Ly is also integrable, and in this case 

j(Kx + Ly) P(dE) = KJxP{dE) + LJyP(dE) . 

VII. If the series 

]?j\x n \P(dE) 

n A 

converges, then the series 

Jmmi Xfi X 

n 

converges at each point of set A with the exception of a certain 
set B for which P(B) — 0. If we set x = everywhere except on 
A - B t then 



jxP{dE)=^jx n P(dE). 



n A 



VIII. If x and y are equivalent (P{* 4= y) ~ 0)» then ^ or 
every set A of 5 

jxP(dE)=jyP(dE). (3) 



3 It is assumed that y is a random variable, i.e., in the terminology of the 
general theory of integration, measurable with respect to % . 



§ 2. Absolute and Conditional Mathematical Expectations 39 

IX. If (3) holds for every set A of gf, then x and y are 
equivalent. 

From the foregoing definition of an integral we also obtain 
the following property, which is not found in the usual Lebesgue 
theory. 

X. Let Pi (A) and P 2 (A) be two probability functions denned 
on the same field %, P ( A ) = P x ( A ) + P 2 ( A \ and let x be integrable 
on A relative to P 1 (A) and P 2 (A) . Then 

jxP(dE) =^jxP x (dE) + jxP 2 {dE). 

AAA 

XL Every bounded random variable is integrable. 

§ 2. Absolute and Conditional Mathematical Expectations 

Let a; be a random variable. The integral 
E(x) = JxP(dE) 

E 

is called in the theory of probability the mathematical expectation 
of the variable x. From the properties III, IV, V, VI, VII, VIII, 
XI, it follows that 

I. |.E(*)|£E(|*|); 
II. E(y) g E(x) if ^ y ^ x everywhere; 

III. inf (x) ^ E(x) ^ sup (x) ; 

IV. E(Kx + Ly) = KE(x) 4- LE(y) ; 

V. E (2 x n) = 2 E (*n) » if the series 2 E ( I *»l ) converges ; 

\ n I n n 

VI. If x and y are equivalent then 

E(z) =E(2/). 

VII. Every bounded random variable has a mathematical 
expectation. 

From the definition of the integral, we have 

k= +oo 

E(x) == lim^£raP{&m:^ # < (jfe.-f 1) w} 

&= — OO 

= lim^rm{F((^+ l)m) - F(£m)} . 



40 IV. Mathematical Expectations 

The second line is nothing more than the usual definition of the 

Stieltjes integral 

+«> 

jadFW(a) = E(*). (1) 

—00 

Formula (1) may therefore serve as a definition of the mathe- 
matical expectation E(x). 

Now let u be a function of the elementary event £, and a; be a 
random variable defined as a single- valued function x — x(u) 
of u. Then 

P{km^x< (k + 1) m} = PW{kfn^ x(u) < (k + \)m}, 

where P (m) (A) is the probability function of u. It then follows 
from the definition of the integral that 

E £(u) 

and, therefore, 

E(x) =Jx{u)PM(dE(«)) (2) 

where E (u) denotes the set of all possible values of u. 

In particular, when u itself is a random variable we have 

+00 
E(x) =jx P {dE) =jx(u) P^idR 1 ) =jx(a) dFW(a) . (3) 

E R l -00 

When x(u) is continuous, the last integral in (3) is the ordinary 
Stieltjes integral. We must note, however, that the integral 



jx(a)dF^{a) 



can exist even when the mathematical expectation E(x) does not. 
For the existence of E(x), it is necessary and sufficient that the 
integral 

f\x(a)\dF( u ){a) 
—00 
be finite 4 . 

If u is a point (u lf u 2 , . . . , u n ) of the space R^then as a result 
of (2): 



4 Cf. V. Glivenko, Sur les valeurs probables de fonctions, Rend. Accad. 
Lincei v. 8, 1928, pp. 480-483. 



§ 2. Absolute and Conditional Mathematical Expectations 41 

E{x) = ft. . . fx(u lt u 2 ,..., u n ) P<«i.«*. -. «■> («*#») . (4) 

We have already seen that the conditional probability P B (A) 
possesses all the properties of a probability function. The corres- 
ponding integral 

Eb(x) = jx? B (dE) (5) 

E 

we call the conditional mathematical expectation of the random 
variable x with respect to the event B. Since 



p B ( B ) = 0, JxP B (dE) =0 



we obtain from (5) the equation 

E B (x) =fxP B (dE) = jxP B (dE) + jxP B (dE) =JxP B (dE) 

E B B B 

We recall that in case AaB, 

P (A\ - P{AB) P{A "> 



we thus obtain 



B 

From (6) and the equality 



(B) P(B) 

^B(x) = ~ ] jxP(dE), (6) 

B 

jxP(dE) = P(B)E B {x). (7) 



A + B 

we obtain at last 



JxP(dE) = JxP(dE) +jxP{dE) 



P(A)E A (*) + P{B)E B (x) 



E^W- — p-; 1 -— (8) 

and, in particular, we have the formula 

EW = P(A)E A {*) + P(A)Ei(x). (9) 



42 IV. Mathematical Expectations 

§ 3. The Tchebycheff Inequality 

Let f(x) be a non-negative function of a real argument x, 
which f or x ^ a never becomes smaller than b > 0. Then for any- 
random variable x 

p[*^)s», (i) 

provided the mathematical expectation E {/(*)} exists. For, 
E{f(x)}=jf(x) P(dE) ^jf(x)P(dE) ^bP(x^a) , 

from which (1) follows at once. 

For example, for every positive c , 

P(x^a)^ E -^. (2) 

Now let f(x) be non-negative, even, and, for positive x, non- 
decreasing. Then for every random variable x and for any choice 
of the constant a > the following inequality holds 

P(|*| fea)3S Iipp. (3). 

In particular, 

P(|* - E(*)| ^ a) £ E/{ Vf W> • (4) 

1 f(a) 

Especially important is the case f(x) = x 2 . We then obtain from 
(3) and (4) 

P(\x\&*)^^p. (5) 

P(|,-E W |^.)^ife^.^, (6) 

where 

oHx) = E{x-E(x)}* 

is called the variance of the variable x. It is easy to calculate that 
o*(x) = E(x*)-{E(x)y. 

If f(x) is bounded: 

\f(x) \^K, 

then a lower bound for P(\x\ ^ a) can be found. For 



§ 4. Some Criteria for Convergence 43 

E (/(*)) ==//(*) P{dE) =jf(x) P(dE) + //(*) P(dE) 

^ f{a)P(\x\ < a) + KP()x\ > a) £ /(«) + KP(|*| > a) 
and therefore 

P(l^l^a)^ E{/( ^- /( ^ . (7) 

If instead of f(x) the random variable x itself is bounded, 

1*1 ^M , 
then /(#) g f(M), and instead of (7), we have the formula 

P(|*|a«U E(/ y (a) . (8) 

In the case /(#) = a; 2 , we have from (8) 

§ 4. Some Criteria for Convergence 

Let 

Xi, %2y • • • y Xni • • • \ * / 

be a sequence of random variables and f(x) be a non-negative, 
even, and for positive x a monotonically increasing function 5 . 
Then the following theorems are true : 

I. In order that the sequence ( 1 ) converge in probability the 
following condition is sufficient : For each e > there exists an n 
such that for every p > 0, the following inequality holds : 

E {f(x n+p - *„)} < e . (2) 

II. In order that the sequence (1) converge in probability to 
the random variable x, the following condition is sufficient : 

HmE{/(* n -%)} = 0. (3) 

n-* +oo 

III. If f(x) is bounded and continuous and /(0) =0, then 
conditions I and II are also necessary. 

IV. If f(x) is continuous, /(0) = 0,and the totality of all 
x u x 2 , . . . , x m . . . , x is bounded,then conditions I and II are also 
necessary. 



5 Therefore f(x) > if x =f= 0. 



44 IV. Mathematical Expectations 

From II and IV, we obtain in particular 
V. In order that sequence (1) converge in probability to x, 
it is sufficient that 

limE(a; n -a;) 2 = . (4) 

If also the totality of all x lt x 2 , . . . , x n , . . . , x is bounded, then the 
condition is also necessary. 

For proofs of I - IV see Slutsky [1] and Frechet [1]. How- 
ever, these theorems follow almost immediately from formulas 
(3) and (8) of the preceding section. 

§ 5. Differentiation and Integration of Mathematical Expectations 
with Respect to a Parameter 

Let us put each elementary event $ into correspondence with a 
definite real function x(t) of a real variable t. We say that x(t) 
is a random function if for every fixed t, the variable x(t) is a 
random variable. The question now arises, under what conditions 
can the mathematical expectation sign be interchanged with the 
integration and differentiation signs. The two following theorems, 
though they do not exhaust the problem, can nevertheless give a 
satisfactory answer to this question in many simple cases. 

Theorem I: // the mathematical expectation E[x(t)~\ is finite 
for any t, and x(t) is always differ -entiable for any t, while the 
derivative x' (t) of x(t) with respect to t is always less in abso- 
lute value than some constant M, then 

^E(x(t)) = E(x'(t)). 

Theorem II: // x(t) always remains less, in absolute value, 

than some constant K and is integrable in the Riemann sense, then 

b r b 

JE(x(t))dt= E jx(t)dt 

a la 

provided E[x(t)] is integrable in the Riemann sense. 

Proof of Theorem I. Let us first note that x' (t) as the limit of 
the random variables 

x(t + h)-x(t) 1 1 

h n-\, -,...,-, ... 

is also a random variable. Since x' (t) is bounded, the mathe- 



§ 5. Differentiation and Integration of Mathematical Expectations 45 



matical expectation E[x'(t)] exists (Property VII of mathe- 
matical expectation, in § 2) . Let us choose a fixed t and denote 
by A the event 



xjt + h) - xjt) 
h 



x'(t) 



> £ 



The probability P ( A) tends to zero as h — ► for every e > 0. Since 



x{t + h) - %{t) 



M, 



x(t)\^M 



holds everywhere, and moreover in the case A 

\ xjt + h)- xjt) 

then 



h 



-At) 



Ex(t + h)^- Ex(t) _ Ex , {t) 



xit + h) - xit) 



-x\t) 



P(A)E 2 



xit + h) -xit) 



x'it) 



P{A)E J 



h 
xit + h) - xit) 



x\t) 



^ 2M?iA) + a . 

We may choose the e > arbitrarily, and P(A) is arbitrarily 
small for any sufficiently small h. Therefore 



dt 



Exit) = lim 



. Exit + h) -Exit) 



Exit), 



h + 



which was to be proved. 
Proof of Theorem II. Let 



k = n 



s n = {]?x(t + kh), ^-~r- 

b 

Since S n converges to J — J x(t) dt, we can choose for any 

a 

e > an N such that from n^N there follows the inequality 

P(^) = P{|S, -/|>£}< £ . 

If we set 

k=n 

S: = l^Exit+kh) = EiS n ), 

k=\ 

then 



|S*-E(/)| = |E(S W -/)|^E|S W -/| 
P(^) E A \S n - J\ + 9(A) Ei|S n - J\ { ^ 2KP{A) + e ^ (2K + l)e . 



46 IV. Mathematical Expectations 

Therefore, S* converges to E(J) , from which results the equation 

b 

Ex(t)dt = limS* n = E(/). 



/' 



Theorem II can easily be generalized for double and triple 
and higher order multiple integrals. We shall give an application 
of this theorem to one example in geometric probability. Let G be a 
measurable region of the plane whose shape depends on chance ; 
in other words, let us assign to every elementary event £ of a field 
of probability a definite measurable plane region G. We shall 
denote by / the area of the region G, and by ?(x, y) the prob- 
ability that the point (x, y) belongs to the region G. Then 

E{J)=jj?{x,y)dxdy. 
To prove this it is sufficient to note that 

/ = s fif(x,y)dxdy l 

P(x;y) = Ef(x,y), 

where f(x,y) is the characteristic function of the region G 
(fix, y) — 1 on G and f(x, y) = outside of G) 6 . 



A- 



6 Cf. A. Kolmogorov and M. Leontovich, Zur Berechnung der mittleren 
Brownschen Fldche, Physik. Zeitschr. d. Sovietunion, v. 4, 1933. 



Chapter V 

CONDITIONAL PROBABILITIES AND 
MATHEMATICAL EXPECTATIONS 

§ 1. Conditional Probabilities 

In § 6, Chapter I, we denned the conditional probability, P^ (B) , 
of the event B with respect to trial %. It was there assumed that % 
allows of only a finite number of different possible results. We 
can, however, define P% (B) also for the case of an % with an infinite 
set of possible results, i.e. the case in which the set E is partitioned 
into an infinite number of non-intersecting subsets. In particular, 
we obtain such a partitioning if we consider an arbitrary function 
u of £ and define as elements of the partition 9l„ the sets u = con- 
stant. The conditional probability P% U {B) we also denote by P U (B). 
Any partitioning 51 of the set E can be denned as the partitioning 
5i M which is "induced" by a function u of £, if one assigns to every $, 
as u(£), that set of the partitioning 51 of E which contains |. 

Two functions u and u' of £ determine the same partitioning 
5l M = 9l M 'Of the set E if and only if there exists a one-to-one cor- 
respondence u' = f(u) between their domains $ U) and 5 (M,) such 
that v! (£) is identical with fu(£) . The reader can easily show that 
the random variables P M (Z?) and P M *( B), defined below, are in this 
case the same. They are thus determined, in fact, by the partition 
9L = ^itself, 

To define P U (B) we may use the following equation: 

P{u C a}(B) = E {ucA} P u (B). (1) 

It is easy to prove that if the set E (u) of all possible values of u is 
finite, equation (1) holds true for any choice of A (when P U (B) 
is defined as in § 6, Chap. I) . In the general case (in which P U (B) 
is not yet defined) we shall prove that there always exists one 
and only one random variable P U (B) (except for the matter of 
equivalence) which is defined as a function of u and which satis- 
fies equation (1) for every choice of A from 5 (m) sucn that 

47 



48 V. Conditional Probabilities and Mathematical Expectations 

PM(A) > 0. The function P U (B) of u thus determined to within 
equivalence, we call the conditional probability of B with respect 
to u (or, for a given u) . The value of P M (Z?) when u = awe shall 
designate by P u (a; B). 

The proof of the existence and uniqueness of P U (B). If we 
multiply (1) by P{ucA} = P<«>(A), we obtain, on the left, 

P{uczA}P ucA {B) = P(B{ucA}) = P\Bu-HAj) 
and, on the right, 

P{ucA}E {ucA} P u (B) = JP U (B) P(dE) =JP U (B) P<*>(rf£(«)) ; 

{ucA} A 

leading to the formula 

P(B«- 1 M))=/P u (B)PW(i£W). (2) 

A 

and conversely (1) follows from (2). In the case P (u HA) = 0, 
in which case (1) is meaningless, equation (2) becomes trivially 
true. Condition (2) is thus equivalent to (1). In accordance with 
Property IX of the integral (§ 1, Chap. IV) the random variable 
x is uniquely defined (except for equivalence) by means of the 
values of the integral 

fxPd(E) 

A 

for all sets of g. Since P U (B) is a random variable determined 
on the probability field (8f<*>, P (M >),it follows that formula (2) 
uniquely determines this variable P U (B) except for equivalence. 

We must still prove the existence of P M (J5). We shall apply 
here the following theorem of Nikodym 1 : 

Let 5 be a Borel field, P(A) a non-negative completely additive 
set function defined on 5 (in the terminology of the probability 
theory, a random variable on (5, P)), and let Q(A) be another 
completely additive set function defined on J$f> such that from 
Q(A)4=0 follows the inequality P(A) > 0. Then there exists a 
function /(£) (in the terminology of the theory of probability, 
a random variable) which is measurable with respect to %, and 
which satisfies, for each set A of 5, the equation 



1 0. Nikodym, Sur une generalisation des integrates de M. J. Ra don, Fund. 
Math. v. 15, 1930 p. 168 (Theorem III). 



§ 1. Conditional Probabilities 49 

0(A) = //(f) P(dE). 

A 

In order to apply this theorem to our case, we need to prove 
1° that 

Q(A) = P(Bu-HA)) 

is a completely additive function on Jp>, 2°, that from Q(A) +0 
follows the inequality P (M >(A) > 0. 
Firstly, 2° follows from 

^ P{B u-HA)) ^ P(u-HA)) = P< m HA) . 

For the proof of 1° we set 

A = Z A n- 
then 

u- l (A)=%u-HA n ) 

n 

and B«->(^)=2B«- l (4). 

n 

Since P is completely additive, it follows that 

P{BurKA$=2P{Bu-HAj) % 

n 

which was to be proved. 

From the equation (1) follows an important formula (if we 
set A = #<«>) : 

P(B) = E(P U (B)). (3) 

Now we shall prove the following two fundamental properties 
of conditional probability. 

Theorem I. It is almost sure that 

0^P u (B) gl. (4) 

Theorem II. // B is decomposed into at most a countable 
number of sets B n : 

B = ZB t 



'n 9 
n 



then the following equality holds almost surely: , 

P«(£)=ZP»(£»)- (5) 

n 

These two properties of P U (B) correspond to the two char- 
acteristic properties of the probability function P(B) : that 
g P(B) ^ 1 always, and that P(B) is completely additive. These 



50 V. Conditional Probabilities and Mathematical Expectations 

allow us to carry over many other basic properties of the absolute 
probability P(B) to the conditional probability P U (B). However, 
we must not forget that P U (B) is,for a fixed set B, a random vari- 
able determined uniquely only to within equivalence. 

Proof of Theorem I. If we assume — contrary to the assertion 
to be proved — that on a set M s a E (M > with P (M > (M) > 0, the in- 
equality P U (B) g 1 +e, e> 0, holds true, then according to for- 
mula (1) 

P{uc:M}{B) = E {ucM} P u (B) ^ i + e, 

which is obviously impossible. In the same way we prove that 
almost surely P U (B) ^ 0. 

Proof of Theorem II. From the convergence of the series 
ZE\P u (B n )\ =2E(P u (fi fl )) =2P(£ n ) = P(B) 

n n n 

it follows from Property V of mathematical expectation (Chap. 

IV, § 2) that the series ■ 

2P.(BJ 

n 

almost surely converges. Since the series 

ZE { uoA}\Pu(B n )\=Z E {u<:A}(Pu(Bn)) = £ P{ UC A}(B n ) = P{u C A}(B) 
n n n 

converges for every choice of the set A such that P (u> *(A) > 0, 
then from Property V of mathematical expectation just referred 
to it follows that for each A of the above kind we have the relation 

E { uc^}(|;P„(£ n )) =|E(, ei) (W) = P {uc a}(B) = E {ucA} (P u (B n )) f 

and from this, equation (5) immediately follows. 

To close this section we shall point out two particular cases. 
If, first, u(i) = c (a constant), then P C (A) = P(A) almost 
surely. If, however, we set u(i) = £, then we obtain at once 
that P$\A) is almost surely equal to one on A and is almost surely 
equal to zero on A. P${A) is thus revealed to be the characteristic 
function of set A. 

§ 2. Explanation of a Borel Paradox 

Let us choose for our basic set E the set of all points on a 
spherical surface. Our 5 wil1 be the aggregate of all Borel sets 
of the spherical surface. And finally, our P(A) is to be propor- 
tional to the measure of set A. Let us now choose two diametrically 



§ 3. Conditional Probabilities with Respect to a Kandom Variable 51 

opposite points for our poles, so that each meridian circle will be 

uniquely defined by the longitude v, ^ ip < n . Since y> varies 

from only to^r, — in other words, we are considering complete 

meridian circles (and not merely semicircles) — the latitude 

must vary from — n to -\-n (and not from — - to + ^ ) . Borel set 

the following problem: Required to determine "the conditional 

probability distribution" of latitude t — 7i<0<+tz, for a 

given longitude^. 

It is easy to calculate that 

e % 

P y> {0 x =g < G 2 } = if\cosG\ d0 . 

The probability distribution of for a given V is not uniform. 

If we assume that the conditional probability distribution of 
"with the hypothesis that $ lies on the given meridian circle" 
must be uniform, then we have arrived at a contradiction. 

This shows that the concept of a conditional probability with 
regard to an isolated given hypothesis whose probability equals 
is inadmissible. For we can obtain a probability distribution 
for on the meridian circle only if we regard this circle as an 
element of the decomposition of the entire spherical surface into 
meridian circles with the given poles. 

§ 3. Conditional Probabilities with Respect to a Random Variable 

If a? is a random variable and P X (B) as a function of x is 
measurable in the Borel sense, then P X (B) can be defined in an 
elementary way. For we can rewrite formula (2) in § 1, to look 
as follows : 

P(£) PJ»(ii) =/P,(B) Pl*)(dE) . (1) 

A 

In this case we obtain from (1) at once that 

a 

P{B)Ff(a)=JP u (a;BydFW(a) . (2) 

— oo 

In accordance with a theorem of Lebesgue 2 it follows from (2) 
that 

P^BJ-PWllmgg+j^gg ^o (3) 

which is always true except for a set H of points a for which 
P<*> (H) = . 

2 Lebesgue, I. c, 1928, pp. 301-302. 



52 V. Conditional Probabilities and Mathematical Expectations 

P x (a; B) was defined in § 1 except on a set G, which is 
such that P ( *> (G) = 0. If we now regard formula (3) as the defi- 
nition of P x (a; B) (setting P x (a; B) = when the limit in the 
right hand side of (3) fails to exist), then this new variable 
satisfies all requirements of § 1. 

If, besides, the probability densities f (x) (a) and fg> (a) exist 
and if f (x Ha) > 0, then formula (3) becomes 

P I (a;S,= P(S ) ;|W. (4) 

Moreover, from formula (3) it follows that the existence of a 
limit in (3) and of a probability density f (x) (a) results in the 
existence of /</> (a). In that case 

P(B) 12(a) &#*(*). (5) 

If P(B) > 0, then from (4) we have 

In case f (x) (a) = 0, then according to (5) /<*> (a) — and there- 
fore (6) also holds. If, besides, the distribution of x is continuous, 
we have 

+ oo +oo 

P(B) = E(P,(B)) =j'P x (a;B)dFW(a) =j? x (a;B)fW(a)da. (7) 

— oo — oo 

From (6) and (7) we obtain 

/?(«>= + y d -*)™ (8) 

fP x (a;B)f*{a)da 

— oo 

This equation gives us the so-called Bayes* Theorem for continu- 
ous distributions. The assumptions under which this theorem is 
proved are these: P X {B) is measurable in the Borel sense and at 
the point a is defined by formula (3) , the distribution of x is con- 
tinuous, and at the point a there exists a probability density 
f ( *Ha). 

§ 4. Conditional Mathematical Expectations 

Let u be an arbitrary function of £, and y a, random variable. 
The random variable E m (t/), representable as a function of u and 
satisfying, for any set A of $ (M > with P (M > (A) > 0, the condition 






§ 4. Conditional Mathematical Expectations 53 

E{u*A}(y) = E {uc ^ } E tt (y) f (1) 

is called (if it exists) the conditional mathematical expectation of 
the variable y for known value of u. 

If we multiply (1) by P {u) {A), we obtain 

jy?(dE)=JE u (y)PM(dEW). (2) 

{ucA} A 

Conversely from (2) follows formula (1). In case P (M) (A) — 0, 
in which case (1) is meaningless, (2) becomes trivial. In the 
same manner as in the case of conditional probability (§1) we 
can prove that E„(y) is determined uniquely — except for equiva- 
lence — by (2). 

The value of E u (y) for w = awe shall denote by E u (a; y) . Let 
us also note that E u (y), as well as P u (y), depends only upon the 
partition 9l M and may be designated by E 9ltt (y) . 

The existence of E(y) is implied in the definition of E u (y) (if 
we set A = #<»>, then E {ucA} (y) = E(y)). 

We shall now prove that the existence of E (y) is also sufficient 
for the existence of E u (y) . For this we only need to prove that by 
the theorem of Nikodym (§1), the set function 

Q(A)=fyP(dE) 

{ucA} 

is completely additive on 5 (m) and absolutely continuous with 
respect to P (m) (A). The first property is proved verbatim as in 
the case of conditional probability (§1). The second property — 
absolute continuity — is contained in the fact that from Q(A)^0 
the inequality P U) (A) >0 must follow. If we assume that 
P (M >(A) = P {udA} = 0,it is clear that 

Q(A)=fyP(dE) = f 

{ucA} 

and our second requirement is thus fulfilled. 

If in equation (1) we set A — E (u \ we obtain the formula 

E(y) = E E U (V) • (3) 

We can show further that almost surely 

E u (ay + bz) = aE u (y) + bE u (z) , (4) 

where a and b are two arbitrary constants. (The proof is left to 
the reader.) 



54 V. Conditional Probabilities and Mathematical Expectations 

If u and v are two functions of the elementary event £, then 
the couple (u, v) can always be regarded as a function of g. The 
following important equation then holds : 

E u E {UtV) (y) = E u (y). (5) 

For,Eu(y) is denned by the relation 

E{ M c^}(y) = E{ttd}E M (y) , 
Therefore we must show that E M E (M , V) (y) satisfies the equation 
E{«cA}(y) = E {Mc ^ } E M E (tt>r) (y) . (6) 

From the definition of E (u>v) (y) it follows that 

E{„cA}(y) = E {Mc ^ } E (M>t;) (y) . (7) 

From the definition of E M E (MjV) (y) it follows, moreover, that 

E{u*a} E (W)t ,) (y) - E {MC ^ } E m E (M>r) (y) . (8) 

Equation (6) results from equations (7) and (8) and thus proves 
our statement. 

If we set y — P U (B) equal to one on B and to zero outside of B, 
then E u (y) = P u {B), 

E {UtU) (y) = P (UtV) (B). 
In this case, from formula (5) we obtain the formula 

E M P( M ,„)(B) = -P u (B) . (9) 

The conditional mathematical expectation E u (y) may also be 
defined directly by means of the corresponding conditional prob- 
abilities. To do this we consider the following sums : 

Sx{u) =~y i °kXP u {kX^y< (k + \)X} = TR k . (10) 

If E(y) exists, the series (10) almost certainly* converges. For 
we have from formula (3), of § 1 , 

E\R k \ = \kk\P{kl&y<(k + i)X}, 

and the convergence of the series 

^ZMP{U^y<(k + i)X}=^E\R k \ 



We use almost certainly interchangeably with almost surely. 



§ 4. Conditional Mathematical Expectations 55 

is the necessary condition for the existence of E(y) (see Chap. IV, 
§ 1). From this convergence it follows that the series (10) con- 
verges almost certainly (see Chap. IV, §2, V). We can further 
show, exactly as in the theory of the Lebesgue integral, that from 
the convergence of (10) for some A, its convergence for every A 
follows, and that in the case where series (10) converges, S x M 
tends to a definite limit as A — ► 3 . We can then define 

E u (y) =limS ; ». (U) 

To prove that the conditional expectation E u (v) defined by rela- 
tion (11) satisfies the requirements set forth above, we need only 
convince ourselves that E M (y), as determined by (11), satisfies 
equation (1). We prove this fact thus: 

E{ueA}Eu(y) = hmE {Mc ^ } S;.(w) 
= lim 2 kX p {u<=A}{k* ^y<(k+l)X}= E {ucA} (y) . 

'/. -> k — — oo 

The interchange of the mathematical expectation sign with the 
limit sign is admissible in this computation, since S x (u) con- 
verges uniformly to E M (y) as A — ► (a simple result of Property V 
of mathematical expectation in §2). The interchange of the 
mathematical expectation sign and the summation sign is also 
admissible since the series 

= ^{u,A}{\kX\ ? u [kl ^y < (k + 1) A]} 

k= — oo 

= ZW ?{u C A}[kl ^y<(k + \)X\ 



converges (an immediate result of Property V of mathematical 
expectation) . 

Instead of (11) we may write 

E.(y)=/y P. (<*£). (12) 

E 

We must not forget here, however, that (12) is not an integral 



3 In this case we consider only a countable sequence of values of A; then 
all probabilities P u {kl<Zy < (k + i)X\ are almost certainly defined for all 
these values of A. 



56 V. Conditional Probabilities and Mathematical Expectations 

in the sense of § 1, Chap. IV, so that (12) is only a symbolic 
expression. 

If x is a random variable then we call the function of x and a 

Ff(a) = P s (y<a) 

the conditional distribution function of y for known x. 

F x {y) (a) is almost certainly defined for every a. If a < b then 
almost certainly 

Ff(a)^Ff(b). 

From (11) and (10) it follows 4 that almost certainly 

E x (y) = lim k=% £kX[Ff{{k + \)l) - Ff(kl)] . (13) 

;. -+ o k = - oo 

This fact can be expressed symbolically by the formula 

+ 00 

E x (y) = fadFf(a) (14) 

— oo 

By means of the new definition of mathematical expectation [(10) 
and (11)] it is easy to prove that, for a real function of u, 

E«[/My]=/(«)E M (y) . (15) 



Cf. footnote 3. 



Chapter VI 

INDEPENDENCE; THE LAW OF LARGE NUMBERS 

§ 1. Independence 

Definition 1 : Two functions, u and v of |, are mutually inde- 
pendent if for any two sets, A of $ (w) , and B of % (v) , the follow- 
ing equation holds: 

P(ucA,vczB) = P{uczA)P{vc:B) = PW(A) P«(B) . (1) 

If the sets E (u) and E {v) consist of only a finite number of elements, 

£(«) = % + u 2 + • • • + u n , 

#*> = »! + . w, + • • • + v m , 

then our definition of independence of u and v is identical with 
the definition of independence of the partitions 

k 

E =^{v = v k } 

k 

as in § 5, Chap. I. 

For the independence of u and v, the following condition is 
necessary and sufficient. For any choice of set A in $ (w) the 
following equation holds almost certainly: 

P v (uczA) = P{uczA) t (2) 

In the case P (v >(£) = 0,both equations (1) and (2) are satisfied, 
and therefore we need only prove their equivalence in the case 
P (v) (B) > 0. In this case (1) is equivalent to the relation 

P {vc b}(uczA) = P{uc:A) (3) 

and therefore to the relation 

E {vcB} P v {uciA) = P(«c2) . (4) 

On the other hand, it is obvious that equation (4) follows from 

57 



58 VI. Independence; The Law of Large Numbers 

(2). Conversely since P v (uczA) is uniquely determined by (4) 
to within probability zero, then equation (2) follows from (4) 
almost certainly. 

Definition 2 : Let M be a set of functions u^ (I) of t These 
functions are called mutually independent in their totality if the 
following condition is satisfied. Let W and M" be two non- 
intersecting subsets of M, and let A' (or A") be a set from g 
defined by a relation among u from M' (or M") ; then we have 

P(A'A") = P(A')P\A"). 

The aggregate of alP« /t of W (or of M") can be regarded as 
coordinates of some function v! (or u"). Definition 2 requires 
only the independence of u' and u" in the sense of Definition 1 for 
each choice of non-intersecting sets W and M" . 

If u lt Mz, . . . , w n are mutually independent, then in all cases 

P{u l aA l , u 2 cA 2 , ..., u n czA n } 



(K) 

= P(«! c 4J P(« t c^ 2 ).,P(m b c^ 

provided the sets A A: belong to the corresponding % {Uk) (proved 
by induction). This equation is not in general, however, at all 
sufficient for the mutual independence of u lt u 2 , . . . , u n . 

Equation (5) is easily generalized for the case of a countably 
infinite product. 

From the mutual independence of u^ in each finite group 
( w mi» u /*,> •-> u t*k) ft does n °t necessarily follow that all u fl are 
mutually independent. 

Finally, it is easy to note that the mutual independence of the 
functions u^ is in reality a property of the corresponding parti- 
tions ty Ufl . Further, if u^ are single-valued functions of the cor- 
responding u fi , then from the mutual independence of u^ follows 
that of u'. 



§ 2. Independent Random Variables 

If x u x 2 , . . . , x n are mutually independent random variables 
then from equation (2) of the foregoing paragraph follows, in 
particular, the formula 

F^ * *»> (a v a 2 , . . . , a n ) = F<**> (a x ) F™ (a 2 ) . . . F^) (a n ) . ( 1 ) 

// in this case the field g (x » **■••> **) consists only of Borel sets of 



§ 2. Independent Random Variables 59 

the space R n , then condition (1) is also sufficient for the mutual 
independence of the variables x lf x 2 , . . . , x n . 

Proof. Let %' = (x^, x it , . .. . , x ik ) and x"= (x h , x h , . . ., x jm ) be 
two non-intersecting subsystems of the variables x lt x 2 , . . . , x„. 
We must show, on the basis of formula ( 1 ) , that for every two 
Borel sets A' and A" of R k (or R m ) the following equation holds : 

P (*' G A', x" c A") = P (*' c A') P (*" c .4") . (2) 

This follows at once from (1) for the sets of the form 

A' = {x (l < a lf x it < a 2 , . . ., x ik < a k } , 

A"= K < b lt x h <b 2 , . . . , Af ;m < b m } . 

It can be shown that this property of the sets A' and A" is pre- 
served under formation of sums and differences, from which 
equation (2) follows for all Borel sets. 

Now let x — {x^} be an arbitrary (in general infinite) aggre- 
gate of random variables. // the field $ (;r) coincides with the field 
B$ M (M is the set of all n) , the aggregate of equations 

JVi,,..../*(*i»*i. .-•»««) =F /Al {a 1 )F fli (a 2 )...F^ n (a n ) (3) 

is necessary and sufficient for the mutual independence of the 
variables x u . 

The necessity of this condition follows at once from formula 
( 1 ) . We shall now prove that it is also sufficient. Let M' and M" 
be two non-intersecting subsets of the set M of all indices ^ and 
let A' (or A") be a set of B% M defined by a relation among the 'x^ 
with indices /x from M' (or M") . We must show that we then have 

P(A'A") = P(^ , )P(^ ,/ ) - (4) 

If A' and A" are cylinder sets then we are dealing with rela- 
tions among a finite set of variables * u , equation (4) represents 
in that case a simple consequence of previous results (Formula 
(2)). And since relation (4) holds for sums and differences of 
sets A' (or A") also, we have proved (4) for all sets of B% M 
as well. 

Now for every n of a set M let there be given a priori a distri- 
bution function F^ (a) ; in that case we can construct a field of 
probability such that certain random variables x^ in that field 
(p assuming all values in M) will be mutually independent, where 
XpWill have for its distribution function the F^ (a) given a priori. 



60 VI. Independence; The Law of Large Numbers 

In order to show this it is enough to take R M for the basic set E 
and B% M for the field g, and to define the distribution functions 
F/hp*.../** ( see Chap. Ill, § 4) by equation (3). 

Let us also note that from the mutual independence of each 
finite group of variables x^ (equation (3)) there follows, as we 
have seen above, the mutual independence of all x^ on B% M . In 
more inclusive fields of probability this property may be lost. 

To conclude this section, we shall give a few more criteria for 
the independence of two random variables. 

If two random variables x and y are mutually independent 
and if E(x) and E(y) are finite then almost certainly 

E,(y) = E(y) 

(5) 



E y (x) = E(x). 

These formulas represent an immediate consequence of the 
second definition of conditional mathematical expectation (For- 
mulas (10) and (11) of Chap. V, § 4). Therefore, in the case of 
independence both 

E[y-E,(y)J» and 2 = E[*-E,(*)]» 

1 o 2 (y) S o 2 (#) 

are equal to zero (provided v 2 (x) > and v 2 (y) > 0). The num- 
ber f 2 is called the correlation ratio of y with respect to x, and g 2 
the same for x with respect to y (Pearson) . 
From (5) it further follows that 

E(xy) = E(x) E(y) . (6) 

To prove this we apply Formula (15) of § 4, Chap. V: 

E(xy) = EE x {xy) = E[xE x (y)] = E[xE(y)] = E(y) E(x) . 

Therefore, in the case of independence 

r = E(*,y)-E(x)E(y) 
o (x) a (y) 

is also equal to zero; r, as is well known, is the correlation co- 
efficient of x and y. 

If two random variables x and y satisfy equation (6), then 
they are called uncorrected. For the sum 

S — x* + x 2 + . . . -f x n 



§ 3. The Law of Large Numbers 61 

where the x lf x 2 , . . . , x n are uncorrelated in pairs, we can easily 
compute that 

o 2 (s) = o 2 (*,) + o*(x 2 ) + • • • + o 2 (*») . (?) 

In particular, equation (7) holds for the independent variables x k . 

§ 3. The Law of Large Numbers 

Random variables s of a sequence 

§lj &2, • • • , O n , . . . 

are called stable, if there exists a numerical sequence 

(Zi, ct 2 , . . . , ct n > . • • 
such that for any positive e 

P{\s n -d n \^e} 

converges to zero as n — *► oo . If all E(s n ) exist and if we may set 

d n = E(s„), 
then the stability is normal. 

If all s n are uniformly bounded, then from 

P{\s n -d n \^e}-+0 » + +oo (1) 

we obtain the relation 

|E(s„) - d n \ -> «->+oo 

and therefore 

P{|s n -E(s ri )|^ £ }->0. «->+oo (2) 

The stability of a bounded stable sequence is thus necessarily 
normal. 

Let E(s n ~E(s n ))^ = aHs n ) = ^. 

According to the TchebychefF inequality, 

P{|s n -E( S „)|^ £ }^^. 

Therefore, the Markov Condition 

<4->0 n^+oo (3) 

is sufficient for normal stability. 



62 VI. Independence; The Law of Large Numbers 

If s n -E(s n ) are uniformly bounded: 
\s n -E(s n ) \^M, 
then from the inequality (9) in § 3, Chap. IV, 

P{| s „-E( Sn )|^}fe^-\ 

Therefore, in this case the Markov condition (3) is also necessary 
for the stability of the s n . 
If 

_ x x + x 2 H j- x n 

Sn ~ n 

and the variables x n are uncorrelated in pairs, we have 
<* = i*{<y 2 (xi) + * 2 (* 2 ) + ••• + **(*»)}• 

Therefore, in this case, the following condition is sufficient for 
the normal stability of the arithmetical means s n : 

°l = o* ( Xl ) + tf (x 2 ) + • • . + a* (*J = (»*) (4) 

(Theorem of Tchebycheff) . In particular, condition (4) is ful- 
filled if all variables x„ are uniformly bounded. 

This theorem can be generalized for the case of weakly cor- 
related variables x n . If we assume that the coefficient of correla- 
tion r mn a of x m and x„ satisfies the inequality 

r mn ^c(\n-m\) 
and that 

c. = 2>(*). 

jfc = 

then a sufficient condition for normal stability of the arithmetic 
means s is 2 

C„oi-o(HP). (5) 

In the case of independent summands x n we can state a neces- 
sary and sufficient condition for the stability of the arithmetic 
means s n . For every x n there exists a constant m n (the median of 
x n ) which satisfies the following conditions: 

P(*n<**n) ^i> 



1 It is obvious that r mn = 1 always. 

2 Cf. A. Khintchine, Swr Za loi forkdes grandes nombres. C. R. de l'acad. 
sci. Paris v. 186, 1928, p. 285. 



§ 3. The Law of Large Numbers 63 

We set 





Xnl 


; = %k 


if 


I ^fc-m fc | ^ n, 






Xnk ~~ 





otherwise, 






c* _ 


Xn 


+ *„* + ••• +*„ 






« 


j relations 










k=n 

ZP{|**- 

*=1 


m k\ 


k=n 


I 


= n 

2 p (*•» + **) -* 
=i 



+ oo (6) 



oM^)=J> 2 (*n*) = <>(* 2 ) (7) 

are necessary and sufficient for the stability of variables s n 3 . 

We may here assume the constants d n to be equal to the E(s„*) 
so that in the case where 

E(s*) -E(s n )-»0 w->+cx) 

(and only in this case) the stability is normal. 

A further generalization of Tchebycheff 's theorem is obtained 
if we assume that the s n depend in some way upon the results of 
any n trials, 

«i, %, • • • , %n . 

so that after each definite outcome of all these n trials s n assumes 
a definite value. The general idea of all these theorems known as 
the law of large numbers, consists in the fact that if the depend- 
ence of variables s n upon each separate trial % k (k = 1, 2, . . . , n) 
is very small for a large n, then the variables s n are stable. If we 
regard 

$ik = E[EH 1 a t ...9u(Sn) — E «,9l«...«*-i( S n)] 2 

as a reasonable measure of the dependence of variables s n upon 
the trial E fc , then the above-mentioned general idea of the law of 
large numbers can be made concrete by the following considera- 
tions 4 . 

*n* = E«, «, . . . 21* ( s n) ~ E^ 9t 2 . . . 2U- _ i (s«) • 



3 Cf. A. KoLMOGOROy . tlber die Summen durch den Zufall bestimmter 
unabhangiger Grossen, Math. Ann. v. 99, 1928, pp. 309-319 (corrections and 
notes to this study, v. 102, 1929 pp. 484-488, Theorem VIII and a supplement 
on p. 318). 

4 Cf. A. KolmogoroY- Sur la loi des grandes nombres. Rend. Accad. Lincei 
v. 9, 1929 pp. 470-474. 



64 VI. Independence; The Law of Large Numbers 

Then 

s n — E(s„) = z x + z 2 + • • • 4- z n , 

E{z nk ) = EE 3ll 9( 8 ... 9lifc (s n ) — EE ? i l9Il ...9i Jt _ 1 (s n ) = E(s n ) - E(s n ) = 0. 

aM^ t ) = E(4,) = ^.. 

We can easily compute also that the random variables z nk (k — 
1, 2, . . . , n) are uncorrelated. For let i < k ; then 5 

E^ x 9l 8 . . . 2U_ i { z ni z nk) = *n» E^i, $(, ....«*_ i ( 2 nit) 
= ^nttE^M. ...3lt_ x (s„) — E 9tl 9i t ...8i fc _ 1 (s w )] = 

and therefore 

E(z ni z nk ) = 0. 
We thus have 

2 (S H ) = 0*(Z ni ) + 0*( Zn2 ) + • • • + O^n) = ft, + # 2 + • • • + fin - 

Therefore, the condition 

is sufficient for the normal stability of the variables s n . 

§ 4. Notes on the Concept of Mathematical Expectation 

We have denned the mathematical expectation of a random 
variable x as 

E{x) = fx?{dE) =jadF&{a) , 

E —<x> 

where the integral on the right is understood as 

+ oo C 

E (x) = fa dF& {a) = lim fa dF& (a). b ""* ~ °° ( 1 ) 

-oo 6 ' 

The idea suggests itself to consider the expression 

E*(x) = lim f a dF& {a) b -* +<x> (2) 

-b 



Application of Formula (15) in §4, Chap. V. 



§ 4. Notes on the Concept of Mathematical Expectation 65 

as a generalized mathematical expectation. We lose in this case, 
of course, several simple properties of mathematical expectation. 
For example, in this case the formula 

E(s + y) = E(x) + E(y) 

is not always true. In this form the generalization is hardly 
admissible. We may add however that, with some restrictive 
supplementary conditions, definition (2) becomes entirely natural 
and useful. 

We can discuss the problem as follows. Let 

X\ t X21 • • • j X n , • • • 

be a sequence of mutually independent variables, having the same 
distribution function F (x ^(a) =F (Xn) (a), (n = 1, 2, . . . ) as x. 
Let further 

*1 + *2 H 1" *n 



We now ask whether there exists a constant E* (x) such that 
for every e > 

limP(|s n -E*(*)| ><0=O, w^+cx). (3) 

The answer is : // such a constant E* (x) exists, it is expressed by 
Formula (2) . The necessary and sufficient condition that Formula 
(3) hold consists in the existence of limit (2) and the relation 

P(|*| >n)-o(±). (4) 

To prove this we apply the theorem that condition (4) is 
necessary and sufficient for the stability of the arithmetic means 
s„, where, in the case of stability, we may set 6 

+ n 

d n =jadF( x )(a) . 

— n 

If there exists a mathematical expectation in the former sense 
(Formula (1)), then condition (4) is always fulfilled 7 . Since in 
this case £(x) = E*(x), the condition (3) actually does define a 
generalization of the concept of mathematical expectation. For 
the generalized mathematical expectation, Properties I - VII 



8 Cf . A. Kolmogorov , Bemerkungen zu meiner Arbeit, "Uber die Summen 
zufdlliger Grossen" Math. Ann. v. 102, 1929, pp. 484-488, Theorem XII. 
7 Ibid, Theorem XIII. 



66 VI. Independence; The Law of Large Numbers 

(Chap. IV, §2) still hold; in general, however, the existence of 
E*| x | does not follow from the existence of E*(#). 

To prove that the new concept of mathematical expectation 
is really more general than the previous one, it is sufficient to 
give the following example. Set the probability density f (x) (a) 
equal to 

Q 

f{X){a) = (|«| + 2)«ln(|ii| + 2) ' 

where the constant C is determined by 

+00 

ff&{a)da = i . 



It is easy to compute that in this case condition (4) is fulfilled. 
Formula (2) gives the value 

E*(x) = 0, 

but the integral 

+00 +00 

j\a\dFW(a)=f\a\fW(a)da 

—00 —00 

diverges. 

§ 5. Strong Law of Large Numbers; Convergence of Series 

The random variables ,9,, of the sequence 

Sit *>2> • • • i Snt • • • 

are strongly stable if there exists a sequence of numbers 

0^1 > C^2> • • • > ("nt ' • • 

such that the random variables 

S n ~ Q/fi 

almost certainly tend to zero as n -> +00 . From strong stability 
follows, obviously, ordinary stability. If we can choose 

d n = E(s„) , 

then the strong stability is normal. 
In the Tchebycheff case, 

c — *» + * 2 J b Xn 



§ 5. Strong Law of Large Numbers; Convergence of Series 67 

where the variables x n are mutually independent. A sufficient 8 
condition for the normal strong stability of the arithmetic means 
s n is the convergence of the series 



2*P ■ (i) 



n=l 

This condition is the best in the sense that for any series of con- 
stants b n such that ^ 

n=l 

we can build a series of mutually independent random variables 
x n such that 

and the corresponding arithmetic means s n will not be strongly 
stable. 

If all x n have the same distribution function F (jr > (a) , then the 
existence of the mathematical expectation 

E(x)=jadFW(a) 

— oo 

is necessary and sufficient for the strong stability of s n ; the sta- 
bility in this case is always normal 9 . 
Again, let 

*£l> X>2) • • • ) X nt . . . 

be mutually independent random variables. Then the probability 
of convergence of the series 

fin (2) 

n=l 

is equal either to one or to zero. In particular, this probability 
equals one when both series 

jjEfoJ and JSy-fo) 

n=l n=l 

converge. Let us further assume 

y n = x n in case [x n \^l, 
y n = in case | x n \ > 1. 



8 Cf. A. Kolmogorov/ Sur la loi forte des grandes nombres, C. R. Acad. Sci. 
Paris v. 191, 1930, pp. 910-911. 

9 The proof of this statement has not yet been published. 



68 VI. Independence; The Law of Large Numbers 

Then in order that series ( 1 ) converge with the probability one, 
it is necessary and sufficient 10 that the following series converge 
simultaneously : 

CO CO CO 

Z p (W>1}. Z E (%.) and 2> 2 (y„) - 

n=l n=l n=l 



10 Cf. A. Khintchine and A. Kolmogorov, On the Convergence of Series, 
Rec. Math. Soc. Moscow, v. 32, 1925, p. 668-677. 



Appendix 

ZERO-OR-ONE LAW IN THE THEORY 
OF PROBABILITY 

We have noticed several cases in which certain limiting 
probabilities are necessarily equal to zero or one. For example, 
the probability of convergence of a series of independent random 
variables may assume only these two values 1 . We shall prove now 
a general theorem including many such cases. 

Theorem : Let x u x z , . . . , x n , . . . be any random variables and 
let f(Xi, x 2 , . . . , x n , . . .) be a Baire function 2 of the variables 
x Xt x 2 , . . . , x„, . . . such that the conditional probability 

P*.*.....*{/(*) = 0} 

of the relation 

f{x 1 ,x 2> ...,x n ,...) =0 

remains, when the first n variables x lf x 2 , . . . , x„ are known, equal 
to the absolute probability 

P{/(*)=0} (1) 

for every n. Under these conditions the probability (1) equals 
zero or one. 

In particular, the assumptions of this theorem are fulfilled if 
the variables x n are mutually independent and if the value of the 
function f(x) remains unchanged when only a finite number of 
variables are changed. 

Proof of the Theorem : Let us denote by A the event 

f(x) =0. 

We shall also investigate the field St of all events which can be 
defined through some relations among a finite number of vari- 



1 Cf . Chap. VI, § 5. The same thing is true of the probability 

PK-rf„-*o} 

in the strong law of large numbers ; at least, when the variables x n are mutu- 
ally independent. 

2 A Baire function is one which can be obtained by successive passages to 
the limit, of sequences of functions, starting with polynomials. 

69 



70 Appendix 

ables x n . If event B belongs to ®, then, according to the conditions 
of the theorem, 

P M (A) = P(A). (2) 

In the case P(A) = our theorem is already true. Let now 
P(A) > 0. Then from (2) follows the formula 

Pa(B) = P ' { ££I B) = P(B), (3) 

and therefore P(B) and P A (B) are two completely additive set 
functions, coinciding on ® ; therefore they must remain equal to 
each other on every set of the Borel extension B® of the field St 
Therefore, in particular, 

P(A) = P A (A\=i, 

which proves our theorem. 

Several other cases in which we can state that certain prob- 
abilities can assume only the values one and zero, were discovered 
by P. Levy. See P. Lriw, Sur un theoreme de M. Khintchine, Bull, 
des Sci. Math. v. 55, 1931, pp. 145-160, Theorem II. 



BIBLIOGRAPHY 

[1]. Bernstein, S.: On the axiomatic foundation of the theory of proba- 
bility. (In Russian). Mitt. Math. Ges. Charkov, 1917, Pp. 209-274. 
[2], — Theory of probability, 2nd edition. (In Russian). Moscow, 1927. 

Government publication RSFSR. 
[1]. Borel, E.: Les probability's denombrables et leurs applications arith- 

metiques. Rend. Circ. mat. Palermo Vol. 27 (1909) Pp. 247-271. 
[2]. — Principes et formules classiques, fasc. 1 du tome I du Traite des 

probabilites par E. Borel et divers auteurs. Paris: Gauthier-Villars 

1925. 
[3], — Applications a V arithmetique et a la theorie des fonctions, fasc. 1 du 

tome II du Traite des probabilites par E. Borel et divers auteurs. 

Paris: Gauthier-Villars 1926. 
[1]. Cantelli, F. P. : Una teoria astratta del Calcolo delle probabilita. Giorn. 

1st. Ital. Attuari Vol. 3 (1932) pp. 257-265. 
[2]. — Sulla legge dei grandi numeri. Mem. Acad. Lincei Vol. 11 (1916). 
[3]. — Sulla probabilita come limite della frequenza. Rend. Accad. Lincei 

Vol. 26 (1917) Pp. 39-45. 
[1]. Copeland- H.: The theory of probability from the pofnt of view of 

admissible numbers. Ann. Math. Statist. Vol. 3 (1932) Pp. 143-156. 
[1]. DORGE, K.: Zu der von R. von M"'ses gegebenen Begrundung der Wahr- 

scheinlichkeitsrechnung. Math. Z. Vol. 32 (1930) Pp. 232-258. 

LI]. Frechet, M.: Sur la convergence en probability. Metron Vol. 8 (1930) 

Pp. 1-48. 
[2]. — Recherches theoriques modernes, fasc. 3 du tome I du Traite des 

probabilites par E. Borel et divers auteurs. Paris: Gauthier-Villars. 

[1]. Kolmogorov, A.: JJber die analytischen Methoden in der Wahrschein- 
lichkeitsrechnung. Math. Ann. Vol. 104 (1931) Pp. 415-458. 

[2]. — The general theory of measure and the theory of probability. (In 
Russian). Sbornik trudow sektii totshnych nauk K. A., Vol. 1 (1929) 
pp. 8-21. 

[1]. Lfiw, P.: Calcul des probabilites. Paris: Gauthier-Villars. 

[1]. Lomnicki, A.: Nouveaux fondements du calcul des probabilites. Fun- 
dam. Math. Vol. 4 (1923) Pp. 34-71. 

[1]. Mises, R. v.: Wahrscheinlichkeitsrechnung. Leipzig u. Wien: Fr. 
Deuticke 1931. 

[2], — Grundlagen der Wahrscheinlichkeitsrechnung. Math. Z. Vol. 5 
(1919) pp. 52-99. 

[3]. — Wahrscheinlichkeitsrechnung, Statistik und Wahrheit. Wien: Julius 
Springer 1928. 

fc/l3']. — Probability, Statistics and Truth (translation of above). New York: 
The MacMillan Company 1939. 

[1]. Reichenbach, H.: Axiomatik der Wahrscheinlichkeitsrechnung. Math. 
Z. Vol. 34 (1932) Pp. 568-619. 

[1]. Slutsky, E.: t)ber stochastische Asymptoten und Grenzwerte. Metron 
Vol. 5 (1925) Pp. 3-89. 

[2]. — On the question of the logical foundation of the theory of proba- 
bility. (In Russian). Westnik Statistiki, Vol. 12 (1922), pp. 13-21. 

[1]. Steinhaus, H.: Les probabilites denombrables et leur rapport a la 
theorie de la mesure. Fundam. Math. Vol. 4 (1923) Pp. 286-310. 

[1]. Tornier, E.: Wahrscheinlichkeitsrechnung und Zahlentheorie. J. reine 
angew. Math. Vol. 160 (1929) Pp. 177-198. 

L2]. — Grundlagen der Wahrscheinlichkeitsrechnung. Acta math. Vol. 60 
(1933) Pp. 239-380. 

71 



CHELSEA BOOKS ON STATISTICS 

THE CALCULUS OF FINITE DIFFERENCES By Charles Jordan 

1947, Second edition, xxi + 652 pages, 5V& x 814. $5.50 

". . . destined to remain the classic treatment of the subject . . . for many years to come. 
"In a word, Professor Jordan's work is a most readable and detailed record of lectures on the 
Calculus of Finite Differences which will certainly appeal tremendously to the statistician and 
which could have been written only by one possessing a deep appreciation of mathematical 
statistics." — Harry C. Carver, Founder and formerly Editor oi the ANNALS OF MATHEMATICAL 
Statistics. 

A HISTORY OF THE MATHEMATICAL THEORY OF PROBABILITY 

By I. Todhunter 
640 pages, 514 inches by 8 inches, previously published at $8.00. $4.95 

Hundreds of problems investigated by the founders and developers of mathematical prob- 
ability and the methods they used are explained and compared in Todhunter's celebrated treatise. 
Many problems in probability and statistics that are thought to be new were actually solved by 
these original thinkers, and the solutions — in considerable detail — are to be found in Todhunter's 
book. 

"It [is] a comprehensive treatise on the Theory of Probability, for it introduces [the reader] 
to almost every process and every species of problem which the literature of the subject can 
furnish." (Preface.) 

ASYMPTOTISCHE GESETZE DER WAHRSCHEINLICHKEITSRECHNUNG 

By A. Khintchine 

1933, 82 pages, 5 l / 2 x 8y 2 inches, paper, originally published at $3.85. $2.00 

". . . Khintchine chooses with care, from among many results, only those which by their 
content and method of treatment contribute most to the uniformity of the . . . theory. Of the 
other fine points of the exposition a few . . •. are his diligence in pointing out interrelationships . . . 
his giving short sketches of proofs beforehand . . . and making simplifications in such a way that 
the idea of the proof comes out as clearly as possible. The monograph is, for these reasons, 
unusually attractive and inspiring." — Acta Szeged. 

THE THEORY OF MATRICES By C. C. MacDuffee 

Second edition. 116 pages 6x9 inches, published originally at $5.20. $2.75 

This important work presents a clear and comprehensive picture of present day matrix 
theory. The author covers the entire field of matrix theory, correlating and integrating the 
enormous mass of results that have been obtained in the subject. A wealth of new material is 
incorporated in the text and the relationship of the various topics to the field as a whole is 
carefully delineated. 

"No mathematical library can afford to be without this book." 

— Bulletin of the American Mathematical Society. 

ERGODENTHEORIE By E. Hopf 

1937, 89 pages, 5 l / 2 x 8*4 inches. $2.75 

"Measure- theoretic viewpoints are preferred over topological ones throughout because, as 

the author says, ergodic theory is statistics and statistics is measure theory. 

". . . chapter on statistics of mappings and fluxes . . . interesting examples worked out . . . 

'individual' ergodic theory, the basis of which is Birkhoff 's theorem . . . applications to the Law 

of Large Numbers, Wiener's theorem on the spectrum of 'random functions'. . . investigations of 

the author on geodetic flow . . ." — Bela v. Sz. Nagy, Acta Szeged. 

VORLESUNGEN t)BER REELLE FUNKTIONEN By C. Caratheodory 

2nd., latest complete, edn., 728 pp., 5% x 8V2, originally published at $11.60. $6.95 

DETERMINANTENTHEORIE 

EINSCHLIESSLICH DER FREDHOLMSCHEN DETERMINANTEN 

By G. Kowaletvski 

Third edition, 1942, 328 pages, 5 l / 2 x 8, originally published at $6.00. $4.25 

From the reviews of earlier editions: 

"a classic in its field . . . excellent treatise . . . remarkably elegant and lucid. . . . The choice 
of subjects . . . has been guided by a true sense of values, not by a mere love of formal develop- 
ments . . . expository powers ... of the first order." 

— Bulletin of the American Mathematical Society. 

GRUNDLAGEN DER ANALYSIS By E. Landau 

1930, 159 pages, 5% x 8, originally published at $4.00. $2.75 

"Certainly no clearer treatment of the foundations of the number system can be offered. . . . 
Never before has this subject been treated with such explicitness. One can only be thankful to the 
author for this fundamental piece of exposition which is alive with his vitality and genius." 

— J. F. Ritt, American Mathematical Monthly. 

The student who wishes to learn mathematical German will find this book ideally suited to 
his needs. Less than fifty German words will enable him to read the entire book with only an 
occasional glance at the vocabulary!* 



* A complete German-English vocabulary has been added. 



gOLLEGE LIBRARY 
Due 



JUL 8 '6 




UL 1 1 



O'M 



I 8 '61 



PE C l 'T SlBl^ft, 



PCTlO^ lf 



*'. % 



',. 7 lbb9Jiili«i:.i'^'-' : - 



oct 3 a,?* 

NOV 1 3 * 



19! t§58 IBB lg '64 Hi T IB 1 4 'W 

3°'sbl 3'65 Ml" 




■ >: : tt 



/ / -/ ^ ' S2> . * 



Grundbegriffe der Wahrscheinli sci 
519.1K81gEC2 



3 lEbE 03E13 04bT 



yGlfr" 






*»,. 











m 



JP» 



m 



■to 



Hawaii 

Hi 




HHi 



B9BHraF : '< ;,w , 
Hssil? 



I H 

IsHmllli 

I m 

InliHHil 






■ 
^h 

1 

■m 



fl!HW«BH 



«B 



HHHHI 

HI 

fBHHHH am 

II i 






M9999R9KB9! 



HH|H 

TH II Hi 

■Hi